跳转至

Foundry

The Foundry connector enables data sharing from one instance of Foundry to another. This workflow requires access to both Foundry instances, and designates one instance as the "source" and the other as the "destination." Throughout the data connection process, users will perform most functions on the destination instance.

Note that this connector is not currently compatible with views as inputs, nor it is compatible with restricted views. Datasets ingested by the destination instance must first be materialized within their source instance.

For example, if a use case requires the transfer of data from red.palantirfoundry.com to blue.palantirfoundry.com, most of the setup and subsequent interactions will take place in the destination instance blue.palantirfoundry.com, which is where the transferred data will ultimately land. The workflows discussed below read data via ingest, rather than write data via export.

Supported capabilities

Capability Status
Bulk import 🟢 Generally available
Streaming ingests 🟢 Generally available
Incremental ingests 🟢 Generally available
Exploration Coming soon
Virtual tables 🟡 Beta
Compute pushdown Not available
Table exports Not available
Export tasks Not available

Setup

  1. Open the Data Connection application and select + New Source in the upper-right corner of the screen.

  2. Select Foundry from the available connector types.

  3. Choose to use a direct connection over the Internet or to connect through an intermediary agent.

  4. Input the hostname of your source Foundry instance. In this case, blue.palantirfoundry.com will pull data from red.palantirfoundry.com, so red.palantirfoundry.com is the source instance.

  5. Choose a means of authentication.

  6. Create an egress policy for the source instance if you are using a direct connection. To ingest data from red.palantirfoundry.com to blue.palantirfoundry.com, create an egress policy for the URL https://red.palantirfoundry.com on port 443. Unlike traditional data connections, you must whitelist all IP addresses within the source instance. This is done through Control Panel by selecting the option to Configure network ingress

  7. Follow the instructions below to configure ingress IP allowlisting for the source instance by adding the destination instance's IP addresses to the source instance's Network ingress extension in Control Panel:

  8. Navigate to the Network egress Control Panel extension in the destination instance to identify the appropriate IP addresses.
  9. Launch the Network ingress Control Panel extension in the source instance.
  10. Review the existing documentation to configure network ingress.

:::callout{theme="neutral"} If you are creating an agent-based connection, then you must provide the appropriate IP addresses based on your agent's host. Additionally, your agent must use Java 21, at a minimum, as agent-based connections using the Foundry connector are not compatible with prior versions of Java. Learn more about identifying IPs when configuring network egress. :::

:::callout{theme="neutral"} Contact Palantir Support if you are unable to access the Network ingress Control Panel extension. :::

Learn more about setting up a connector in Foundry.

Authentication

The Foundry connector supports the following authentication methods:

Client credentials (production): For long-lived connections, we only allow client credentials. To create a client credential follow the below steps:

  1. Navigate to Developer Console with the following link:
https://<SOURCE_FOUNDRY_INSTANCE>.palantirfoundry.com/workspace/developer-console/
  1. Select + New application and provide a name.
  2. Select No, I will not use an Ontology SDK and be sure to enable your application after selecting the organization it belongs to.
  3. Select Backend service.
  4. Provide your application with appropriate permissions. You can choose Application permissions or User permissions but, you should leave application permissions checked by default.
  5. You will be shown a client secret that you must copy to your clipboard then paste into the destination Foundry instance.
  6. After saving, navigate to Oauth & permissions in the left menu.
  7. Copy your client ID.

This process will create a Service user for which you can provide or deny access to assets in Foundry. To check if this service user has access to a dataset or a project, you can use the Check access feature for the given asset.

Personal access token (temporary): For security purposes, we don't allow tokens to be used in production use cases. Ingests will fail if a sync is run while relying on a token with a life span greater than 36 hours.

Authentication credentials are input in the destination instance. In the source instance, you must create a token that will afford the destination instance the ability to read data. To do so, navigate to the following URL:

https://<SOURCE_FOUNDRY_INSTANCE>.palantirfoundry.com/workspace/settings/tokens

Then, select + Create token in the upper-right corner. At this step you can name your token and choose its lifespan. Then, copy your token and navigate to the destination Foundry instance.

The provided credentials must have the following necessary privileges:

  • Browse and read datasets in the source Foundry instance
  • Read from specific projects and datasets being synced

Networking

The Foundry connector requires network access to the destination Foundry instance on port 443 (HTTPS). The destination instance needs an egress policy that corresponds to the URL of the source instance.

To enable direct connections from a Foundry instance to another Foundry instance, the appropriate egress policies must be added when setting up the source in the Data Connection application.

Egress policies are not needed for connection using an agent.

Sync data from Foundry

To set up a Foundry-to-Foundry sync, select Explore and create syncs in the upper-right of the source Overview screen. Browse the available projects and datasets in the source Foundry instance, then select the datasets you want to sync. When ready, select Create sync for x datasets.

Incremental syncs

Incremental, or Append, syncs maintain state about the most recent sync and only ingest new or changed data from the target dataset. There are two ways of establishing these ingests with the Foundry Connector.

Incremental option 1: Ingest all data, then updates only

The "initial incremental state" can be set to an arbitrarily distant date, like January 1, 1970. On the first run of the ingest, all data will extracted. Starting from the second run onwards, each ingest will only extract the newest data available.

Incremental option 2: Ingest all data after a specific date

The "initial incremental state" can be set to a date of your choosing, like January 1, 2024. Similar to the above option, the first run of the ingest will extract all data. Then subsequent ingests will only extract the newest data available. This is a more filtered option for use cases where the author of the ingest knows that they want to exclude data that was written in the external source system prior to a particular date.

Create streaming syncs

You can ingest a stream from one Foundry enrollment to another. The dataset from the source enrollment must be a stream. A sync can be established with a dataset RID and a branch name. After specifying a schema and running for the first time, a new dataset will be created in the destination enrollment.

Streaming offset options

The sync can be configured to ingest only newly created rows, or to start by ingesting all existing rows of the stream. The main trade-off to consider is that ingesting all historical rows can be expensive from a time and compute perspective if the streaming dataset is sufficiently large.

Virtual tables

:::callout{theme="warning"} Datasets are not supported with virtual tables. Only managed Iceberg tables on the "source" Foundry instance can be virtualized. :::

This section provides additional details around using virtual tables with a Foundry source. This section is not applicable when syncing to Foundry datasets.

The table below highlights the virtual table capabilities that are supported for Foundry.

Capability Status
Bulk registration 🔴 Not available
Automatic registration 🔴 Not available
Table inputs 🟢 Generally available: tables in Code Repositories, Pipeline Builder
Table outputs 🟢 Generally available: tables in Code Repositories, Pipeline Builder
Incremental pipelines 🟢 Generally available
Compute pushdown 🔴 Not available

Review the virtual tables documentation for details on the supported workflows where Foundry tables can be used as inputs or outputs.

Ensure that the "destination" Foundry instance has network access to the "source" Foundry instance as well as the location of the bucket backing the Iceberg table. Verify that this bucket allows ingress from the "destination" Foundry instance.

Call the Foundry API from code

In addition to the Foundry-to-Foundry sync workflow above, a Python transform can call the Foundry API directly using the OAuth2 client credentials grant. Use this pattern when you need to invoke Foundry endpoints that are not exposed as syncs — for example, to enumerate project contents, trigger builds, or read from the Ontology API from within a transform.

The overall setup (REST API source configuration, storing client_id/client_secret as additional secrets, and the generic token-request and pagination scaffolding) is the same as any other OAuth2 client credentials flow. Review the OAuth Client Credentials grant example on the REST API connector page for the generic pattern.

The sections below cover the Foundry-specific details that differ from a third-party API.

Foundry token endpoint

Foundry's OAuth2 token endpoint is:

POST /multipass/api/oauth2/token

The endpoint is hosted on the Foundry instance you are calling. If the transform runs on the same instance it is calling, the hostname on the REST API source is that same instance; if it is a different instance, the source's hostname is that of the target instance and you must configure egress policies and ingress allowlisting as described in Networking.

Learn more about the token endpoint parameters.

Foundry scopes

Every Foundry API endpoint documents the scope it requires — see the per-endpoint reference in the API documentation. Some common examples:

Scope Grants access to
api:datasets-read Read datasets
api:datasets-write Write datasets
api:ontologies-read Read Ontology objects and link types

Request only the scopes your transform needs. Multiple scopes are separated by spaces, for example api:datasets-read api:datasets-write.

Create a client

The client_id and client_secret used in the token request come from a third-party application registered on the target Foundry instance. Follow the steps in Authentication above to create a backend service application and obtain the client_id and client_secret, then store them as additional secrets on your REST API source.

The service user created for the client must be granted permissions on every project, dataset, or Ontology resource the transform needs to access.

Example: List project children

The following transform requests an access token against /multipass/api/oauth2/token, then uses the token to list the children of a project via the Foundry /api/v2/filesystem/resources/{rid}/children endpoint. For the generic token-request and pagination scaffolding this example reuses, see the REST API OAuth Client Credentials grant example.

import logging

import pandas as pd
from transforms.api import Output, transform_pandas
from transforms.external.systems import external_systems, Source, ResolvedSource

logger = logging.getLogger(__name__)


@external_systems(
    foundry_api_source=Source("<source_rid>")
)
@transform_pandas(
    Output("<output_dataset_rid>"),
)
def compute(foundry_api_source: ResolvedSource) -> pd.DataFrame:
    base_url = foundry_api_source.get_https_connection().url
    client = foundry_api_source.get_https_connection().get_client()

    client_id = foundry_api_source.get_secret("additionalSecretClientId")
    client_secret = foundry_api_source.get_secret("additionalSecretClientSecret")

    token_response = client.post(
        base_url + "/multipass/api/oauth2/token",
        data={
            "grant_type": "client_credentials",
            "client_id": client_id,
            "client_secret": client_secret,
            "scope": "api:datasets-read",
        },
        headers={"Content-Type": "application/x-www-form-urlencoded"},
    )
    token_response.raise_for_status()
    access_token = token_response.json()["access_token"]

    auth_headers = {"Authorization": f"Bearer {access_token}"}

    project_rid = "ri.compass.main.folder.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    resources = []
    page_token = None
    while True:
        params = {"pageSize": 100}
        if page_token:
            params["pageToken"] = page_token

        response = client.get(
            base_url + f"/api/v2/filesystem/resources/{project_rid}/children",
            headers=auth_headers,
            params=params,
        )
        response.raise_for_status()
        body = response.json()

        for resource in body.get("data", []):
            resources.append({
                "rid": resource.get("rid"),
                "name": resource.get("displayName"),
                "type": resource.get("type"),
            })

        page_token = body.get("nextPageToken")
        if not page_token:
            break

        logger.info(f"Fetched {len(resources)} resources so far, continuing to next page.")

    return pd.DataFrame(resources)

中文翻译

Foundry

Foundry 连接器(Connector)支持从一个 Foundry 实例向另一个实例共享数据。此工作流需要访问两个 Foundry 实例,并将其中一个实例指定为"源端(Source)",另一个指定为"目标端(Destination)"。在整个数据连接过程中,用户将在目标端实例上执行大部分功能。

请注意,此连接器目前不兼容将视图(Views)作为输入,也不兼容受限视图(Restricted Views)。目标端实例摄取的数据集必须先在源端实例中物化(Materialized)。

例如,如果某个用例需要将数据从 red.palantirfoundry.com 传输到 blue.palantirfoundry.com,则大部分设置和后续交互将在目标端实例 blue.palantirfoundry.com 中进行,传输的数据最终将落在此处。下文讨论的工作流通过摄取(Ingest)读取数据,而非通过导出(Export)写入数据。

支持的功能

功能 状态
批量导入(Bulk import) 🟢 正式发布(Generally available)
流式摄取(Streaming ingests) 🟢 正式发布(Generally available)
增量摄取(Incremental ingests) 🟢 正式发布(Generally available)
探索(Exploration) 即将推出(Coming soon)
虚拟表(Virtual tables) 🟡 Beta 版
计算下推(Compute pushdown) 不可用(Not available)
表导出(Table exports) 不可用(Not available)
导出任务(Export tasks) 不可用(Not available)

设置

  1. 打开数据连接(Data Connection)应用程序,在屏幕右上角选择 + 新建源(+ New Source)

  2. 从可用的连接器类型中选择 Foundry

  3. 选择通过互联网使用直接连接(Direct Connection),或通过中介代理(Intermediary Agent)进行连接。

  4. 输入源端 Foundry 实例的 hostname。在此例中,blue.palantirfoundry.com 将从 red.palantirfoundry.com 拉取数据,因此 red.palantirfoundry.com 是源端实例。

  5. 选择一种身份验证方式。

  6. 如果使用直接连接,请为源端实例创建出站策略(Egress Policy)。要从 red.palantirfoundry.comblue.palantirfoundry.com 摄取数据,请为 URL https://red.palantirfoundry.com端口 443 上创建出站策略。与传统数据连接不同,您必须将源端实例内的所有 IP 地址加入白名单。这可通过控制面板(Control Panel)选择 配置网络入站(Configure network ingress) 选项来完成。

  7. 按照以下说明,通过将目标端实例的 IP 地址添加到源端实例控制面板中的网络入站(Network ingress)扩展,为源端实例配置入站 IP 白名单:

  8. 导航至目标端实例中的网络出站(Network egress)控制面板扩展,以识别相应的 IP 地址。
  9. 在源端实例中启动网络入站(Network ingress)控制面板扩展。
  10. 查阅现有文档以配置网络入站(Configure network ingress)

:::callout{theme="neutral"} 如果您正在创建基于代理的连接(Agent-based Connection),则必须根据代理的主机提供相应的 IP 地址。此外,您的代理必须至少使用 Java 21,因为使用 Foundry 连接器的基于代理的连接不兼容早期版本的 Java。了解有关配置网络出站时识别 IP 的更多信息。 :::

:::callout{theme="neutral"} 如果您无法访问网络入站(Network ingress)控制面板扩展,请联系 Palantir 支持。 :::

了解有关在 Foundry 中设置连接器(Setting up a connector)的更多信息。

身份验证

Foundry 连接器支持以下身份验证方法:

客户端凭据(Client Credentials)(生产环境): 对于长期连接,我们仅允许使用客户端凭据。要创建客户端凭据,请执行以下步骤:

  1. 使用以下链接导航至开发者控制台(Developer Console):
https://<SOURCE_FOUNDRY_INSTANCE>.palantirfoundry.com/workspace/developer-console/
  1. 选择 + 新建应用程序(+ New application) 并提供名称。
  2. 选择 不,我不会使用本体论 SDK(No, I will not use an Ontology SDK),并在选择应用程序所属组织后,务必启用(Enable)您的应用程序。
  3. 选择 后端服务(Backend service)
  4. 为您的应用程序授予适当的权限。您可以选择应用程序权限(Application permissions)用户权限(User permissions),但默认应保持选中应用程序权限。
  5. 系统将显示一个客户端密钥(Client Secret),您必须将其复制到剪贴板,然后粘贴到目标端 Foundry 实例中。
  6. 保存后,导航至左侧菜单中的 OAuth 与权限(Oauth & permissions)
  7. 复制您的客户端 ID(Client ID)。

此过程将创建一个服务用户(Service User),您可以为其授予或拒绝对 Foundry 中资产的访问权限。要检查此服务用户是否有权访问数据集或项目,您可以使用给定资产的检查访问权限(Check access)功能。

个人访问令牌(Personal Access Token)(临时): 出于安全目的,我们不允许在生产用例中使用令牌。如果同步运行时依赖生命周期超过 36 小时的令牌,摄取将失败。

身份验证凭据在目标端实例中输入。在源端实例中,您必须创建一个令牌,使目标端实例能够读取数据。为此,请导航至以下 URL:

https://<SOURCE_FOUNDRY_INSTANCE>.palantirfoundry.com/workspace/settings/tokens

然后,在右上角选择 + 创建令牌(+ Create token)。在此步骤中,您可以命名令牌并选择其生命周期。然后,复制令牌并导航至目标端 Foundry 实例。

提供的凭据必须具有以下必要权限:

  • 浏览和读取源端 Foundry 实例中的数据集
  • 读取正在同步的特定项目和数据集

网络

Foundry 连接器需要在端口 443 (HTTPS) 上对目标端 Foundry 实例进行网络访问。目标端实例需要有一个与源端实例 URL 相对应的出站策略。

要启用从一个 Foundry 实例到另一个 Foundry 实例的直接连接,在数据连接应用程序(Data Connection application)中设置源端时,必须添加相应的出站策略(Egress Policies)

使用代理的连接不需要出站策略。

从 Foundry 同步数据

要设置 Foundry 到 Foundry 的同步,请在源端概览(Overview)屏幕的右上角选择 探索并创建同步(Explore and create syncs)。浏览源端 Foundry 实例中可用的项目和数据集,然后选择要同步的数据集。准备就绪后,选择 为 x 个数据集创建同步(Create sync for x datasets)

增量同步

增量同步(Incremental/Append syncs)会维护最近一次同步的状态,并且仅从目标数据集摄取新增或更改的数据。使用 Foundry 连接器建立此类摄取有两种方法。

增量选项 1:先摄取所有数据,之后仅摄取更新

"初始增量状态(Initial incremental state)"可以设置为任意较早的日期,例如 1970 年 1 月 1 日。在首次运行摄取时,将提取所有数据。从第二次运行开始,每次摄取将仅提取最新的可用数据。

增量选项 2:摄取特定日期之后的所有数据

"初始增量状态(Initial incremental state)"可以设置为您选择的日期,例如 2024 年 1 月 1 日。与上述选项类似,首次运行摄取将提取所有数据。随后的摄取将仅提取最新的可用数据。对于摄取作者知道要排除在特定日期之前写入外部源系统的数据的用例,这是一个更具过滤性的选项。

创建流式同步

您可以将流从一个 Foundry 注册(Enrollment)摄取到另一个注册。源端注册的数据集必须是流。可以使用数据集 RID 和分支名称建立同步。在指定模式并首次运行后,将在目标端注册中创建一个新数据集。

流式偏移选项

可以将同步配置为仅摄取新创建的行,或者从摄取流的所有现有行开始。需要考虑的主要权衡是,如果流式数据集足够大,从时间和计算角度来看,摄取所有历史行可能会非常昂贵。

虚拟表

:::callout{theme="warning"} 虚拟表不支持数据集。只有"源端"Foundry 实例上的托管(Managed) Iceberg 表(Iceberg Tables)可以被虚拟化。 :::

本节提供有关将虚拟表(Virtual Tables)与 Foundry 源端一起使用的更多详细信息。本节不适用于同步到 Foundry 数据集。

下表突出显示了 Foundry 支持的虚拟表功能。

功能 状态
批量注册(Bulk registration) 🔴 不可用(Not available)
自动注册(Automatic registration) 🔴 不可用(Not available)
表输入(Table inputs) 🟢 正式发布(Generally available):代码仓库(Code Repositories)管道构建器(Pipeline Builder)中的表
表输出(Table outputs) 🟢 正式发布(Generally available):代码仓库(Code Repositories)管道构建器(Pipeline Builder)中的表
增量管道(Incremental pipelines) 🟢 正式发布(Generally available)
计算下推(Compute pushdown) 🔴 不可用(Not available)

请查阅虚拟表文档(Virtual Tables Documentation),了解支持将 Foundry 表用作输入或输出的工作流的详细信息。

确保"目标端"Foundry 实例对"源端"Foundry 实例以及支持 Iceberg 表的存储桶位置具有网络访问权限。验证此存储桶是否允许来自"目标端"Foundry 实例的入站流量。

从代码调用 Foundry API

除了上述 Foundry 到 Foundry 的同步工作流之外,Python 转换(Transform)还可以使用 OAuth2 客户端凭据授权(OAuth2 Client Credentials Grant)直接调用 Foundry API。当您需要调用未作为同步公开的 Foundry 端点时(例如,枚举项目内容、触发构建或从转换内部读取本体论 API),请使用此模式。

整体设置(REST API 源配置、将 client_id/client_secret 存储为附加密钥,以及通用的令牌请求和分页框架)与任何其他 OAuth2 客户端凭据流程相同。请查阅 REST API 连接器页面上的 OAuth 客户端凭据授权示例(OAuth Client Credentials Grant Example)以了解通用模式。

以下各节涵盖了与第三方 API 不同的 Foundry 特定细节。

Foundry 令牌端点

Foundry 的 OAuth2 令牌端点是:

POST /multipass/api/oauth2/token

该端点托管在您正在调用的 Foundry 实例上。如果转换在其调用的同一实例上运行,则 REST API 源上的 hostname 就是该实例;如果是不同的实例,则源的 hostname 是目标实例的 hostname,并且您必须按照网络(Networking)部分所述配置出站策略(Egress Policies)入站白名单(Ingress Allowlisting)

了解有关令牌端点参数(Token Endpoint Parameters)的更多信息。

Foundry 作用域

每个 Foundry API 端点都记录了其所需的作用域(Scope)——请参阅 API 文档(API Documentation)中的每个端点参考。一些常见示例:

作用域(Scope) 授予的访问权限
api:datasets-read 读取数据集
api:datasets-write 写入数据集
api:ontologies-read 读取本体论对象和链接类型

仅请求您的转换所需的作用域。多个作用域之间用空格分隔,例如 api:datasets-read api:datasets-write

创建客户端

令牌请求中使用的 client_idclient_secret 来自在目标端 Foundry 实例上注册的第三方应用程序(Third-party Application)。按照上文身份验证(Authentication)中的步骤创建后端服务应用程序并获取 client_idclient_secret,然后将它们作为附加密钥(Additional Secrets)存储在您的 REST API 源上。

为客户端创建的服务用户必须被授予对转换需要访问的每个项目、数据集或本体论资源的权限。

示例:列出项目子项

以下转换向 /multipass/api/oauth2/token 请求访问令牌,然后使用该令牌通过 Foundry 的 /api/v2/filesystem/resources/{rid}/children 端点列出项目的子项。有关此示例重用的通用令牌请求和分页框架,请参阅 REST API OAuth 客户端凭据授权示例(REST API OAuth Client Credentials Grant Example)

import logging

import pandas as pd
from transforms.api import Output, transform_pandas
from transforms.external.systems import external_systems, Source, ResolvedSource

logger = logging.getLogger(__name__)


@external_systems(
    foundry_api_source=Source("<source_rid>")
)
@transform_pandas(
    Output("<output_dataset_rid>"),
)
def compute(foundry_api_source: ResolvedSource) -> pd.DataFrame:
    base_url = foundry_api_source.get_https_connection().url
    client = foundry_api_source.get_https_connection().get_client()

    client_id = foundry_api_source.get_secret("additionalSecretClientId")
    client_secret = foundry_api_source.get_secret("additionalSecretClientSecret")

    token_response = client.post(
        base_url + "/multipass/api/oauth2/token",
        data={
            "grant_type": "client_credentials",
            "client_id": client_id,
            "client_secret": client_secret,
            "scope": "api:datasets-read",
        },
        headers={"Content-Type": "application/x-www-form-urlencoded"},
    )
    token_response.raise_for_status()
    access_token = token_response.json()["access_token"]

    auth_headers = {"Authorization": f"Bearer {access_token}"}

    project_rid = "ri.compass.main.folder.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    resources = []
    page_token = None
    while True:
        params = {"pageSize": 100}
        if page_token:
            params["pageToken"] = page_token

        response = client.get(
            base_url + f"/api/v2/filesystem/resources/{project_rid}/children",
            headers=auth_headers,
            params=params,
        )
        response.raise_for_status()
        body = response.json()

        for resource in body.get("data", []):
            resources.append({
                "rid": resource.get("rid"),
                "name": resource.get("displayName"),
                "type": resource.get("type"),
            })

        page_token = body.get("nextPageToken")
        if not page_token:
            break

        logger.info(f"Fetched {len(resources)} resources so far, continuing to next page.")

    return pd.DataFrame(resources)

相关文档