Configuring bring-your-own-bucket storage for Iceberg tables(为 Iceberg 表配置自带存储桶(bring-your-own-bucket)存储)¶
This guide describes how to configure customer-managed storage buckets for use with Foundry Iceberg tables. These steps are only required if you are using bring-your-own-bucket (BYOB) storage. If you are using Foundry-managed storage, no additional configuration is needed.
Foundry supports BYOB storage on AWS (S3), Azure (ADLS), and Google (GCS).
Step 1: Create your storage bucket¶
Provision your storage bucket ideally in the same region as your Foundry instance. While not required, this is recommended for optimal performance.
Configure appropriate network access on the storage account to permit connectivity from Foundry.
AWS S3¶
-
Provision an S3 bucket.
-
Create an IAM role with the following permissions on the S3 bucket and the KMS key used to encrypt it:
| Permission | Resource |
|---|---|
s3:DeleteObject |
S3 bucket |
s3:GetObject |
S3 bucket |
s3:ListBucket |
S3 bucket |
s3:PutObject |
S3 bucket |
kms:Decrypt |
KMS key |
kms:Encrypt |
KMS key |
kms:GenerateDataKey |
KMS key |
sts:GetFederationToken |
— |
- Create an IAM user or OIDC identity provider that can assume the role you created. You will use the IAM user's credentials or the OIDC provider's tokens when configuring the Data Connection source. See the S3 source documentation for more detail on supported authentication mechanisms.
:::callout{theme="warning"} Setting up an S3 bucket hosted in the same region as your Foundry enrollment requires additional configuration. You must explicitly allow traffic from Palantir's VPC endpoint to the bucket under Network egress in Control Panel. Read more about these requirements in the network egress documentation. :::
Azure ABFS¶
-
Provision a storage account and container.
-
Provision client credentials for authentication. See the ABFS source documentation documentation for more detail on supported authentication mechanisms.
For Iceberg BYOB, configure the source with an ADLS endpoint on dfs.core.windows.net. blob.core.windows.net endpoints are not supported.
- Grant the service principal access to the storage location:
- Assign the Storage Blob Data Contributor ↗ role on the container where data will be stored.
- Ensure the service principal has at least Delegator permissions at the storage account level. Container-level permissions alone are not sufficient for Foundry.
Google GCS¶
-
Provision a Google Cloud Storage bucket.
-
Create a Google Cloud IAM service account.
-
Grant the service account access to the bucket:
- Assign Storage Object Admin ↗ on the bucket to allow Foundry Iceberg to read, write, and delete Iceberg metadata and data files.
-
Assign Service Account Token Creator ↗ on the service account itself so Foundry can vend short-lived, scoped credentials to Iceberg clients.
-
Create either a JSON service account key or the equivalent PKCS8 credential values for the service account. These are the supported authentication methods when using a GCS source for BYOB Iceberg storage.
Step 2: Create a Data Connection source¶
:::callout{theme="warning"} BYOB sources and credentials are highly privileged. Configure them with restrictive access settings, such as by placing them in an administrator-only project. This limits who can access the credentials and prevents unauthorized modifications that could disrupt access to your BYOB Iceberg data. :::
:::callout{theme="warning"} When a bucket is used as a backing store for Iceberg tables, all egress policies configured on the bucket become available to workflows that access those tables. User-written code in those workflows will be allowed to egress to the destinations permitted by those policies. :::
Once your bucket is provisioned, create a Data Connection source to connect Foundry to your storage. Configuring a source for BYOB use requires the Owner role on the source. If you do not have the Owner role, contact a source owner or your platform administrator to update these settings. For more information, see Source permissions.
- In Data Connection, create a new source (either S3, ABFS, or Google Cloud Storage) using the credentials you provisioned.
- Optional: Specify a base path prefix for your Iceberg tables by appending it to the source URL (for example,
s3://bucket-name/base-path/). If no base path is provided in the source, Foundry will setfoundry-icebergas the Iceberg base path. Do not modify the base path after configuring the source as an Iceberg storage location, as this may disrupt access to existing tables. - Use an authentication method supported by the selected source:
- For ADLS and S3: any authentication mechanism that Data Connection offers is supported, including
access key and secretorOIDC. - For GCS:
JSON credentialsorPKCS8 authare the supported options. - Enable the Enable exports to this source setting on the source.
:::callout{theme="neutral"} Leave the Code import configuration settings disabled. These settings are not required to use Foundry Iceberg in those contexts. :::
:::callout{theme="neutral" title="Security consideration"} Do not grant credentials to the storage bucket directly to Iceberg clients or other tools. Instead, leverage credential vending through the Foundry Iceberg catalog to provide scoped, short-lived access. See Access delegation & credential vending for more information. :::
Step 3: Add the bucket in Control Panel¶
After creating your Data Connection source, add the bucket to your Iceberg storage configuration in Control Panel. See Configuring storage locations for instructions.
中文翻译¶
为 Iceberg 表配置自带存储桶(bring-your-own-bucket)存储¶
本指南介绍如何配置客户管理的存储桶,以用于 Foundry Iceberg 表。仅当您使用自带存储桶(BYOB)时才需要执行这些步骤。如果使用 Foundry 管理的存储,则无需额外配置。
Foundry 支持在 AWS(S3)、Azure(ADLS)和 Google(GCS)上使用 BYOB 存储。
第 1 步:创建存储桶¶
建议将存储桶预置在与 Foundry 实例相同的区域。虽然不是强制要求,但这样做有助于实现最佳性能。
在存储账户上配置适当的网络访问权限,以允许从 Foundry 进行连接。
AWS S3¶
-
预置一个 S3 存储桶。
-
创建一个 IAM 角色,该角色对 S3 存储桶及其加密所用的 KMS 密钥拥有以下权限:
| 权限 | 资源 |
|---|---|
s3:DeleteObject |
S3 存储桶 |
s3:GetObject |
S3 存储桶 |
s3:ListBucket |
S3 存储桶 |
s3:PutObject |
S3 存储桶 |
kms:Decrypt |
KMS 密钥 |
kms:Encrypt |
KMS 密钥 |
kms:GenerateDataKey |
KMS 密钥 |
sts:GetFederationToken |
— |
- 创建一个可以代入您所创建角色的 IAM 用户或 OIDC 身份提供商(OIDC identity provider)。在配置数据连接(Data Connection)源时,您将使用该 IAM 用户的凭证或 OIDC 提供商的令牌。有关支持的认证机制的更多详情,请参阅 S3 源文档。
:::callout{theme="warning"} 如果 S3 存储桶与您的 Foundry 注册实例位于同一区域,则需要额外配置。您必须在控制面板(Control Panel)的网络出站(Network egress)下,明确允许来自 Palantir VPC 端点的流量访问该存储桶。有关这些要求的更多信息,请阅读网络出站文档。 :::
Azure ABFS¶
-
预置一个存储账户(storage account)和容器(container)。
-
预置用于认证的客户端凭证。有关支持的认证机制的更多详情,请参阅 ABFS 源文档。
对于 Iceberg BYOB,请使用 dfs.core.windows.net 上的 ADLS 端点配置源。不支持 blob.core.windows.net 端点。
- 授予服务主体(service principal)对存储位置的访问权限:
- 在将要存储数据的容器上,分配 存储 Blob 数据参与者(Storage Blob Data Contributor)↗ 角色。
- 确保服务主体在存储账户级别至少拥有委托者(Delegator)权限。仅容器级别的权限不足以满足 Foundry 的要求。
Google GCS¶
-
预置一个 Google Cloud Storage 存储桶。
-
创建一个 Google Cloud IAM 服务账户(service account)。
-
授予服务账户对存储桶的访问权限:
- 在存储桶上分配 存储对象管理员(Storage Object Admin)↗ 角色,以允许 Foundry Iceberg 读取、写入和删除 Iceberg 元数据及数据文件。
-
在服务账户本身上分配 服务账户令牌创建者(Service Account Token Creator)↗ 角色,以便 Foundry 能够向 Iceberg 客户端提供短期、作用域受限的凭证。
-
为服务账户创建一个 JSON 服务账户密钥或等效的 PKCS8 凭证值。这些是在使用 GCS 源进行 BYOB Iceberg 存储时支持的认证方法。
第 2 步:创建数据连接源¶
:::callout{theme="warning"} BYOB 源和凭证具有高权限。请使用限制性访问设置对其进行配置,例如将其放置在仅管理员可访问的项目中。这样可以限制谁可以访问凭证,并防止未经授权的修改中断对您的 BYOB Iceberg 数据的访问。 :::
:::callout{theme="warning"} 当存储桶用作 Iceberg 表的后端存储时,该存储桶上配置的所有出站策略都将对访问这些表的工作流可用。这些工作流中用户编写的代码将被允许向这些策略允许的目标进行出站访问。 :::
预置好存储桶后,创建一个数据连接(Data Connection)源,将 Foundry 连接到您的存储。为 BYOB 使用配置源需要拥有该源的所有者(Owner)角色。如果您没有所有者角色,请联系源所有者或您的平台管理员更新这些设置。更多信息,请参阅源权限。
- 在数据连接中,使用您预置的凭证创建一个新源(S3、ABFS 或 Google Cloud Storage)。
- 可选:通过将基础路径前缀附加到源 URL 来为您的 Iceberg 表指定一个基础路径(例如
s3://bucket-name/base-path/)。如果源中未提供基础路径,Foundry 将设置foundry-iceberg作为 Iceberg 基础路径。在将源配置为 Iceberg 存储位置后,请勿修改基础路径,否则可能会中断对现有表的访问。 - 使用所选源支持的认证方法:
- 对于 ADLS 和 S3: 支持数据连接提供的任何认证机制,包括
access key and secret或OIDC。 - 对于 GCS: 支持的选项为
JSON credentials或PKCS8 auth。 - 在源上启用启用对此源的导出(Enable exports to this source)设置。
:::callout{theme="neutral"} 请保持代码导入配置(Code import configuration)设置处于禁用状态。在这些场景下使用 Foundry Iceberg 不需要这些设置。 :::
:::callout{theme="neutral" title="安全考量"} 请勿将存储桶的凭证直接授予 Iceberg 客户端或其他工具。相反,应利用通过 Foundry Iceberg 目录(catalog)提供的凭证分发(credential vending)机制,提供作用域受限的短期访问。更多信息,请参阅访问委托与凭证分发。 :::
第 3 步:在控制面板中添加存储桶¶
创建数据连接源后,在控制面板中将其添加到您的 Iceberg 存储配置中。有关说明,请参阅配置存储位置。