跳转至

Data Connection FAQ(数据连接常见问题解答(Data Connection FAQ))

The following are some frequently asked questions about Data Connection.

For general information, view our Data Connection documentation.


Scheduled build never ran

A build was scheduled to run at a given time, but it did not attempt to run.

To troubleshoot, perform the following steps:

  1. Is there another running sync for this dataset and branch? Verify that no other job is running at the same time as it is not possible to run two syncs on the same dataset and branch simultaneously.

  2. Has the schedule been paused? Verify this schedule is not paused on the schedule overview page for this dataset. You can access this page via the Edit in Scheduler view of the sync or in Dataset Preview's Manage Schedules option.

  3. Was the agent of this sync disabled? Navigate to the agent associated with the source of this sync, and verify the agent is not disabled.

Return to top


Ingestion led to duplicate rows in a dataset when running a SNAPSHOT transaction

A sync ran but led to duplicate rows.

  1. When creating a new sync, choose to run an APPEND type sync instead of a Snapshot.
  2. When declaring the incremental settings, if last_upd_in_appl_ts is the column that will be unique and always increasing, set that as the column and then select a value that is less than all the other values in this column.
  3. After this, no additional action should be required as Data Connection will track the latest value that was synced and only bring in additional newer rows. Newer means greater than previous values, and newer timestamps are greater than older timestamps.

Return to top


Incremental load run time info

What value was used when I ran my incremental sync?

To troubleshoot, perform the following steps:

  1. Go to the dataset.
  2. Select History near the top of the screen.
  3. Select the relevant transaction you are interested in on the left part of the screen.
  4. View Build.
  5. View Transaction.
  6. Expand the section for Custom Metadata towards the bottom of the screen.
  7. Review the block for incrementalMetadata and verify correctness.

Return to top


Column type is not consistent between database and dataset

My type is something different from what appears on my dataset after it synced.

To troubleshoot, perform the following steps:

  1. If the column is TIMESTAMP, verify if the resultant type is LONG in Foundry. If it is LONG, you need to parse the type using your data preparation tool of choice (Code Repositories, Preparation, or another application) to TIMESTAMP. This is a side effect of many drivers provided by database creators where types are reverted to their safest representations.
  2. If the column is DECIMAL and has a different precision than the original database, we recommend casting the number to a specific precision and scale in the query on the database itself, or casting the column to VARCHAR in the query and re-casting in Foundry.

Return to top


TIME type columns lose sub-second precision between database and dataset

My database includes TIME type columns with sub-second precision and values with a non-zero sub-second component, but the value in the dataset only reflects distinctions up to the second component.

For most JDBC-based sources, including PostgreSQL and Microsoft SQL Server, a TIME value in the database is reflected in the Foundry output dataset as an INTEGER representing the milliseconds since midnight rounded down to the nearest full second. As a result, values in the database like 05:00:00.000, 05:00:00.200, and 05:00:00.800 will all become 18000000 in the output dataset.

To preserve sub-second distinctions, consider casting the value to a string in the sync configuration's SQL statement, as in the below Postgres example:

SELECT record_id, CAST(column_with_time_type AS text) FROM table_name

Status of running query

After a query begins running, how do I check its status?

To troubleshoot, perform the following steps:

  1. Open the Job tracker application and select the running sync.
  2. The most granular status of a sync is shown here.
  3. If possible, verify the query behavior in the source database.

Return to top


Sync is failing with a schema mismatch

If the schema of a file or JDBC table changes between incremental APPEND transactions, your dataset will start failing with complaints of schema mismatches. Data Connection does infer schema for JDBC extracts, only propagates existing schemas for file-based extracts. In this event, you would have to apply the schema inference again if it is the same. If schemas truly have changed between APPEND transactions, a new dataset is needed for the new schema.

To troubleshoot, perform the following steps:

File-based

  1. If the files are XLSX or CSV tabular data, it may be possible to re-infer schemas on the synced dataset without issue. If this schema matches the previous, the dataset will add the appended rows without issue.

  2. If after inferring schema you still get schema errors (either in Dataset Preview or another application), then this new file needs to instead be synced to a new dataset since it represents a fundamentally different view of a table.

  3. If the dataset was already appended with the new schema, we recommend reaching out to Palantir Support to revert this transaction. Additionally, the syncs of files to the current dataset need to be paused by going to the sync overview page and pausing any schedule associated with this sync.

  4. Subsequent files with new schemas should sync to a different dataset than the original, so we recommend copying the information from the original sync into a new sync, but replacing the target dataset with a different one (annotating in the dataset name the new version).

  5. Additionally, it may be best to delete the original sync to avoid any future schema mismatch errors from occurring and corrupting the existing data in Foundry.

JDBC

  1. In the case where you expect a schema to change at some point and persist in its new form into the future, it is best to land the original table into a dataset whose name indicates the schema version. (Such as account_transactions_v1.0).
  2. If a schema changes in the original table before a sync executes:
  3. Pause the sync's schedule (if it exists)
  4. If a schema changes in the original table after a sync executes:
  5. Pause the sync's schedule (if it exists)
  6. Contact Palantir Support to revert this transaction which has likely corrupted the target dataset
  7. After your sync is paused, and you are ready to move to the new schema, you must first land the new schema into a new dataset:
  8. Clone the sync into a new sync and replace the target dataset with a new one (such as account_transactions_v1.1). This new dataset can then be unioned with the original to contain the full set of data.
  9. If required for your use case, you can delete the original sync after verifying correct behavior in the new sync. This ensures no possibility of landing corrupt data in the old dataset at the cost of decreased transparency on prior loading configuration.

Return to top


Permission error when re-adding a deleted sync job spec

After deleting a sync's job spec, attempting to re-add it results in a permission error.

To resolve this issue:

Create a new sync that outputs to the same dataset. This establishes a new job spec with the correct permissions while preserving your existing data in the target dataset.

Return to top


Bootvisor status is Unknown

My Bootvisor is stuck in a Unknown state and won't stop / start.

:::callout{theme="warning"} Contact Palantir Support to check your setup before proceeding. Following the below steps will temporarily prevent syncs from running on this agent. Ensure multiple agents for sources or perform these steps during maintenance windows to prevent downtime. :::

  1. Pause syncs on the agent.
  2. Wait for all currently running syncs to finish.
  3. Stop the agent (this step may fail depending on the current state of the agent).
  4. SSH into the agent machine.
  5. kill all JVM processes related to Data Connection.
  6. Start the Bootvisor with ./service/bin/init.sh start.
  7. Start the agent in Data Connection, verifying the agent is not paused as well.

Return to top


Common TLS/SSL issues

For guidance on TLS, mTLS, and SSL — including whether to add a server or client certificate and how to resolve common certificate errors such as CERTIFICATE_VERIFY_FAILED — see Certificates and TLS.

Return to top


中文翻译

# 数据连接常见问题解答(Data Connection FAQ)

以下是关于数据连接(Data Connection)的一些常见问题。

如需了解一般信息,请查看我们的[数据连接文档](https://palantir.com/docs/foundry/data-connection/overview/)。

* [计划构建从未运行](#scheduled-build-never-ran)
* [运行`SNAPSHOT`事务时,数据摄取导致数据集出现重复行](#ingestion-led-to-duplicate-rows-in-a-dataset-when-running-a-snapshot-transaction)
* [增量加载运行时间信息](#incremental-load-run-time-info)
* [数据库与数据集之间的列类型不一致](#column-type-is-not-consistent-between-database-and-dataset)
* [正在运行的查询状态](#status-of-running-query)
* [同步因架构不匹配而失败](#sync-is-failing-with-a-schema-mismatch)
* [重新添加已删除的同步作业规范时出现权限错误](#permission-error-when-re-adding-a-deleted-sync-job-spec)
* [Bootvisor状态为`Unknown`](#bootvisor-status-is-unknown)
* [SSL/TLS问题](#common-tlsssl-issues)

***

## 计划构建从未运行(Scheduled build never ran)

一个构建任务被安排在特定时间运行,但它并未尝试执行。

请执行以下步骤进行故障排除:

1. 该数据集和分支上是否有另一个正在运行的同步?请确认没有其他作业同时运行,因为无法在同一数据集和分支上同时运行两个同步。
2. 该计划是否已被暂停?请在该数据集的计划概览页面上确认此计划未被暂停。您可以通过同步的**在调度器中编辑(Edit in Scheduler)**视图或数据集预览(Data Preview)中的**管理计划(Manage Schedules)**选项访问此页面。
3. 此同步的代理(Agent)是否已被禁用?导航到与此同步源关联的代理,并确认该代理未被禁用。

[返回顶部](#data-connection-faq)

***

## 运行`SNAPSHOT`事务时,数据摄取导致数据集出现重复行

同步已运行,但导致了重复行。

1. 创建新同步时,选择运行`APPEND`类型的同步,而不是`Snapshot`。
2. 声明增量设置时,如果`last_upd_in_appl_ts`是唯一且持续递增的列,请将其设置为该列,然后选择一个小于此列中所有其他值的值。
3. 此后,无需执行额外操作,因为数据连接(Data Connection)将跟踪已同步的最新值,并仅引入额外的新行。较新意味着值大于先前值,较新的时间戳大于较旧的时间戳。

[返回顶部](#data-connection-faq)

***

## 增量加载运行时间信息(Incremental load run time info)

我运行增量同步时使用了什么值?

请执行以下步骤进行故障排除:

1. 转到数据集。
2. 选择屏幕顶部附近的**历史记录(History)**。
3. 在屏幕左侧选择您感兴趣的相应事务。
4. 查看**构建(Build)**。
5. 查看**事务(Transaction)**。
6. 展开屏幕底部的**自定义元数据(Custom Metadata)**部分。
7. 检查`incrementalMetadata`块并验证其正确性。

[返回顶部](#data-connection-faq)

***

## 数据库与数据集之间的列类型不一致(Column type is not consistent between database and dataset)

我的类型与同步后数据集中显示的类型不同。

请执行以下步骤进行故障排除:

1. 如果列是`TIMESTAMP`类型,请验证Foundry中的结果类型是否为`LONG`。如果是`LONG`,您需要使用您选择的数据准备工具(代码仓库(Code Repositories)、数据准备(Preparation)或其他应用程序)将其解析为`TIMESTAMP`类型。这是许多数据库驱动程序将类型还原为其最安全表示形式的副作用。
2. 如果列是`DECIMAL`类型且精度与原始数据库不同,我们建议在数据库本身的查询中将数字转换为特定的精度和小数位数,或者在查询中将列转换为`VARCHAR`类型,然后在Foundry中重新转换。

[返回顶部](#data-connection-faq)

***

## `TIME`类型列在数据库和数据集之间丢失了亚秒级精度

我的数据库包含具有亚秒级精度的`TIME`类型列,并且值具有非零的亚秒部分,但数据集中的值仅反映到秒级的区分。

对于大多数基于JDBC的数据源,包括PostgreSQL和Microsoft SQL Server,数据库中的`TIME`值在Foundry输出数据集中反映为一个`INTEGER`,表示自午夜以来的毫秒数,并向下舍入到最接近的整秒。因此,数据库中像`05:00:00.000`、`05:00:00.200`和`05:00:00.800`这样的值在输出数据集中都将变为`18000000`。

为了保留亚秒级区分,请考虑在同步配置的SQL语句中将该值转换为字符串,如下面的PostgreSQL示例所示:

```sql
SELECT record_id, CAST(column_with_time_type AS text) FROM table_name

正在运行的查询状态(Status of running query)

查询开始运行后,如何检查其状态?

请执行以下步骤进行故障排除:

  1. 打开作业跟踪器(Job tracker)应用程序并选择正在运行的同步。
  2. 此处显示同步的最细粒度状态。
  3. 如果可能,请在源数据库中验证查询行为。

返回顶部


同步因架构不匹配而失败(Sync is failing with a schema mismatch)

如果文件或JDBC表的架构在增量APPEND事务之间发生变化,您的数据集将开始失败并报错架构不匹配。数据连接(Data Connection)会为JDBC提取推断架构,但仅为基于文件的提取传播现有架构。在这种情况下,如果架构相同,您需要重新应用架构推断。如果APPEND事务之间的架构确实发生了变化,则需要为新架构创建一个新数据集。

请执行以下步骤进行故障排除:

基于文件(File-based)

  1. 如果文件是XLSX或CSV表格数据,则可能可以在同步的数据集上重新推断架构而不会出现问题。如果此架构与之前的架构匹配,数据集将毫无问题地添加追加的行。
  2. 如果在推断架构后仍然出现架构错误(无论是在数据集预览(Data Preview)还是其他应用程序中),则此新文件需要同步到一个新数据集,因为它代表了一个根本不同的表视图。
  3. 如果数据集已经使用新架构进行了追加,我们建议联系Palantir支持以回滚此事务。此外,需要通过转到同步概览页面并暂停与此同步关联的任何计划来暂停将文件同步到当前数据集的操作。
  4. 后续具有新架构的文件应同步到与原始数据集不同的数据集,因此我们建议将原始同步的信息复制到一个新同步中,但将目标数据集替换为另一个数据集(在数据集名称中注明新版本)。
  5. 此外,最好删除原始同步,以避免将来发生任何架构不匹配错误并损坏Foundry中的现有数据。

JDBC

  1. 如果您预期架构会在某个时间点发生变化并以其新形式持续存在,最好将原始表导入到一个名称指示架构版本的数据集中。(例如account_transactions_v1.0)。
  2. 如果在同步执行之前原始表的架构发生变化:
  3. 暂停同步的计划(如果存在)
  4. 如果在同步执行之后原始表的架构发生变化:
  5. 暂停同步的计划(如果存在)
  6. 联系Palantir支持以回滚此可能已损坏目标数据集的事务
  7. 在您的同步暂停后,并且您准备好迁移到新架构时,您必须首先将新架构导入到一个新数据集中:
  8. 将同步克隆到一个新同步中,并将目标数据集替换为一个新的数据集(例如account_transactions_v1.1)。然后可以将此新数据集与原始数据集合并,以包含完整的数据集。
  9. 如果您的用例需要,您可以在验证新同步中的正确行为后删除原始同步。这确保了在旧数据集中不会导入损坏数据的可能性,但代价是降低了先前加载配置的透明度。

返回顶部


重新添加已删除的同步作业规范时出现权限错误(Permission error when re-adding a deleted sync job spec)

删除同步的作业规范后,尝试重新添加会导致权限错误。

要解决此问题:

创建一个输出到同一数据集的新同步。这将建立一个具有正确权限的新作业规范,同时保留目标数据集中的现有数据。

返回顶部


Bootvisor状态为Unknown(Bootvisor status is Unknown)

我的Bootvisor卡在Unknown状态,无法停止/启动。

:::callout{theme="warning"} 在继续操作之前,请联系Palantir支持以检查您的设置。执行以下步骤将暂时阻止在此代理上运行同步。请确保为源配置了多个代理,或者在维护窗口期间执行这些步骤以防止停机。 :::

  1. 暂停代理上的同步。
  2. 等待所有当前正在运行的同步完成。
  3. 停止代理(此步骤可能因代理的当前状态而失败)。
  4. SSH登录到代理机器。
  5. kill所有与数据连接(Data Connection)相关的JVM进程。
  6. 使用./service/bin/init.sh start启动Bootvisor。
  7. 在数据连接(Data Connection)中启动代理,并确认代理也未处于paused状态。

返回顶部


常见TLS/SSL问题(Common TLS/SSL issues)

有关TLS、mTLS和SSL的指导——包括是否添加服务器或客户端证书以及如何解决常见的证书错误(如CERTIFICATE_VERIFY_FAILED)——请参阅证书和TLS(Certificates and TLS)

返回顶部 ```