跳转至

Authenticating Iceberg clients(认证 Iceberg 客户端)

This section describes how Iceberg clients can authenticate with Foundry's Iceberg catalog. These steps are not required when using Iceberg within Foundry.

Foundry's Iceberg catalog implements the specification ↗ for Iceberg REST catalogs. This means that Iceberg clients that support REST catalogs can use the authentication mechanisms defined in the spec when interacting with Foundry's Iceberg catalog.

When configuring Iceberg clients, the following authentication options are available:

Method Description
OAuth2 client credentials Authenticate as a service user using a client ID and client secret. Once configured, the Iceberg client will perform any necessary token exchange.
Bearer Authenticate as a user using an API token (generally referred to as a bearer token).

:::callout{theme="neutral"} Clients should not be configured with credentials to cloud storage (such as S3 or Azure Data Lake Storage (ADLS)). See the Access delegation & credential vending section below for details on how to grant access to clients via the catalog. :::

Using an API token

API tokens, also called Bearer Tokens, are the fastest way to authenticate and get started. You can generate an API token for your user by following the User-generated tokens documentation.

The following properties are required when configuring an Iceberg client with a bearer token:

Key Example Required? Description
uri https://<hostname>/iceberg Yes URI identifying the Foundry Iceberg catalog
token eyJwb... Yes Bearer token value to use for Authorization header

The example below uses the Python Iceberg client (PyIceberg) and configures a catalog in a ~/.pyiceberg.yaml. The catalog properties are documented here ↗ and can be adapted for other Iceberg client implementations that support REST catalogs.

catalog:
  foundry:
    uri: https://your.foundry/iceberg
    token: eyJwb...

:::callout{theme="warning"} These generated bearer tokens are long-lived and tied to your user. They should be handled with care. We recommend using OAuth2 for production usage and limiting token-based authentication to development. :::

Using OAuth2

You can use OAuth2 with Iceberg clients for increased security over directly providing an API bearer token. You will need to create a third-party OAuth2 application with the client credentials grant as documented here.

The Iceberg client is given the generated client credentials and a URL to the authorization server which exchanges client credentials for an application token. Iceberg clients handle this exchange and subsequent token refresh.

The following properties are used when configuring an Iceberg client for OAuth2:

Key Example Required? Description
uri https://<hostname>/iceberg Yes URI identifying the Foundry Iceberg catalog
oauth2-server-uri https://<hostname>/iceberg/v1/oauth/tokens Yes URI identifying the authorization server
credential client_id:client_secret Yes Credential to use for OAuth2 credential flow when initializing the catalog
scope api:iceberg-read api:iceberg-write No Space-separated scope to limit permissions

With these properties, you can configure a PyIceberg catalog client for OAuth2 as shown below:

catalog:
  default:
    uri: https://your.foundry/iceberg
    oauth2-server-uri: https://your.foundry/iceberg/v1/oauth/tokens  # OAuth2 server URL
    credential: 17f...:037...  # client_id:client_secret
    scope: api:iceberg-read api:iceberg-write  # optional, space-separated scope

The scope configuration is optional. Permissions default to the permissions of the service user created for the third-party application. You can limit scope to only allow reads by declaring api:iceberg-read without api:iceberg-write.

:::callout{theme="warning"} Foundry uses an Iceberg-flavored authorization server and thus a different endpoint than the endpoint generally used for OAuth2 clients. :::

Access delegation & credential vending

Storage access is delegated by the Iceberg catalog using the access delegation mechanisms described in the Iceberg specification ↗. Foundry's catalog offers credential vending (recommended) or remote signing according to the following support matrix:

Storage location Credential vending Credential refresh Remote signing
ABFS Yes, via SAS tokens No No
S3 Yes, via STS tokens Yes Yes

Credential vending

:::callout{theme="success"} With credential vending, Foundry's Iceberg catalog grants data access to authenticated clients through scoped, short-lived vended credentials. This approach enforces the principle of least privilege and enhances security by minimizing exposure of storage credentials. :::

When an Iceberg client interacts with Foundry's Iceberg catalog, storage credentials are never provided directly by the user or application. Instead, the Foundry Iceberg catalog issues temporary credentials that are tightly scoped to specific data and permissions.

The credential vending process works as follows:

  • Access request: The Iceberg client initiates a request to access one or more tables using the Foundry Iceberg REST catalog API; for example, when loading table metadata.
  • Authentication: Foundry's REST catalog authenticates the client and verifies their permissions for the requested table and action.
  • Credential vending: If the client is authorized, Foundry's catalog "vends" (that is, issues) short-lived, narrowly scoped storage credentials. Each vended credential is valid only for specific files and for a limited duration.
  • Data access: The client uses these temporary credentials to directly access the underlying storage (such as S3 or ABFS) to read or write data. Once the credentials expire, access is automatically revoked.

Troubleshooting authentication

If you encounter issues connecting to the Foundry Iceberg REST catalog, use the steps below to isolate the problem.

Verify connectivity with a bearer token

Start by confirming that you can connect to the catalog using an API token from a local Jupyter notebook. This verifies basic network connectivity and catalog access independently of your OAuth2 configuration.

from pyiceberg.catalog import load_rest
from getpass import getpass

token = getpass("Foundry user token:")

catalog = load_rest(
    "foundry",
    {
        "uri": "https://<your_foundry_url>/iceberg",
        "token": token,
    },
)

table = catalog.load_table("<table_rid>")
print(table.snapshots())
del token

Replace <your_foundry_url> with your Foundry hostname and <table_rid> with your Iceberg table's resource identifier. If this step succeeds, the catalog is reachable and your user has the correct permissions.

Verify OAuth2 credentials

After confirming bearer token connectivity, test your OAuth2 client credentials from a local Jupyter notebook. This verifies that the third-party application, its scopes, and the service user permissions are configured correctly. If this step is not successful, verify that the third-party application and its scopes are configured correctly and that the service account has been granted permissions to access the relevant Compass folders and Iceberg tables.

from pyiceberg.catalog import load_rest
from getpass import getpass

foundry_url = "https://<your_foundry_url>"
client_id = "<your_client_id>"

client_secret = getpass("OAuth2 client secret:")

catalog = load_rest(
    name="foundry",
    conf={
        "uri": f"{foundry_url}/iceberg",
        "oauth2-server-uri": f"{foundry_url}/iceberg/v1/oauth/tokens",
        "credential": f"{client_id}:{client_secret}",
        "scope": "api:iceberg-read api:iceberg-write",
    },
)

# List namespaces to verify catalog access
print(catalog.list_namespaces())

# Load a specific table to verify table-level permissions.
# If you are unable to read the table, verify that the service account associated with the third-party application has permissions to access it.
table = catalog.load_table("<table_rid>")
print(table.snapshots())
del client_secret

Replace <your_foundry_url>, <your_client_id>, and <table_rid> with your environment-specific values.


中文翻译

认证 Iceberg 客户端

本节介绍 Iceberg 客户端如何与 Foundry 的 Iceberg 目录进行认证。在 Foundry 内部使用 Iceberg 时,无需执行这些步骤。

Foundry 的 Iceberg 目录实现了 Iceberg REST 目录规范 ↗。这意味着支持 REST 目录的 Iceberg 客户端在与 Foundry 的 Iceberg 目录交互时,可以使用该规范中定义的认证机制。

配置 Iceberg 客户端时,可使用以下认证选项:

方法 描述
OAuth2 客户端凭据(OAuth2 client credentials) 使用客户端 ID 和客户端密钥以服务用户身份进行认证。配置完成后,Iceberg 客户端将执行必要的令牌交换。
Bearer 使用 API 令牌(通常称为 bearer 令牌)以用户身份进行认证。

:::callout{theme="neutral"} 客户端不应配置云存储凭据(如 S3 或 Azure Data Lake Storage (ADLS))。有关如何通过目录向客户端授予访问权限的详细信息,请参阅下面的访问委托与凭据分发(Access delegation & credential vending)部分。 :::

使用 API 令牌

API 令牌(也称为 Bearer 令牌)是进行认证和快速上手的最快方式。您可以按照用户生成令牌(User-generated tokens)文档为您的用户生成 API 令牌。

使用 bearer 令牌配置 Iceberg 客户端时需要以下属性:

示例 是否必需 描述
uri https://<hostname>/iceberg 标识 Foundry Iceberg 目录的 URI
token eyJwb... 用于 Authorization 标头的 Bearer 令牌值

以下示例使用 Python Iceberg 客户端 (PyIceberg) 并在 ~/.pyiceberg.yaml 中配置目录。目录属性记录在此处 ↗,可适用于其他支持 REST 目录的 Iceberg 客户端实现。

catalog:
  foundry:
    uri: https://your.foundry/iceberg
    token: eyJwb...

:::callout{theme="warning"} 这些生成的 bearer 令牌长期有效,并与您的用户绑定。应谨慎处理。我们建议在生产环境中使用 OAuth2,并将基于令牌的认证限制在开发环境中使用。 :::

使用 OAuth2

您可以对 Iceberg 客户端使用 OAuth2,以提高安全性,避免直接提供 API bearer 令牌。您需要按照此处的文档,创建一个使用客户端凭据授权(Client Credentials Grant)的第三方 OAuth2 应用程序。

Iceberg 客户端会获得生成的客户端凭据以及授权服务器的 URL,该服务器将客户端凭据交换为应用程序令牌。Iceberg 客户端负责处理此交换及后续的令牌刷新。

为 OAuth2 配置 Iceberg 客户端时使用以下属性:

示例 是否必需 描述
uri https://<hostname>/iceberg 标识 Foundry Iceberg 目录的 URI
oauth2-server-uri https://<hostname>/iceberg/v1/oauth/tokens 标识授权服务器的 URI
credential client_id:client_secret 初始化目录时用于 OAuth2 凭据流的凭据
scope api:iceberg-read api:iceberg-write 用于限制权限的空格分隔的作用域

使用这些属性,您可以按如下所示配置用于 OAuth2 的 PyIceberg 目录客户端:

catalog:
  default:
    uri: https://your.foundry/iceberg
    oauth2-server-uri: https://your.foundry/iceberg/v1/oauth/tokens  # OAuth2 服务器 URL
    credential: 17f...:037...  # client_id:client_secret
    scope: api:iceberg-read api:iceberg-write  # 可选,空格分隔的作用域

scope 配置是可选的。权限默认为为第三方应用程序创建的服务用户的权限。您可以通过仅声明 api:iceberg-read 而不声明 api:iceberg-write 来限制作用域,仅允许读取操作。

:::callout{theme="warning"} Foundry 使用 Iceberg 风格的授权服务器,因此其端点与 OAuth2 客户端通常使用的端点不同。 :::

访问委托与凭据分发(Access delegation & credential vending)

存储访问由 Iceberg 目录使用 Iceberg 规范 ↗ 中描述的访问委托机制进行委托。Foundry 的目录根据以下支持矩阵提供凭据分发(推荐)或远程签名:

存储位置 凭据分发(Credential vending) 凭据刷新(Credential refresh) 远程签名(Remote signing)
ABFS 是,通过 SAS 令牌
S3 是,通过 STS 令牌

凭据分发(Credential vending)

:::callout{theme="success"} 通过凭据分发,Foundry 的 Iceberg 目录通过作用域受限、短期有效的分发凭据向已认证的客户端授予数据访问权限。这种方法遵循最小权限原则,并通过最小化存储凭据的暴露来增强安全性。 :::

当 Iceberg 客户端与 Foundry 的 Iceberg 目录交互时,存储凭据永远不会由用户或应用程序直接提供。相反,Foundry Iceberg 目录会颁发临时凭据,这些凭据严格限定在特定数据和权限范围内。

凭据分发过程如下:

  • 访问请求: Iceberg 客户端使用 Foundry Iceberg REST 目录 API 发起访问一个或多个表的请求;例如,在加载表元数据时。
  • 认证: Foundry 的 REST 目录对客户端进行认证,并验证其对所请求表和操作的权限。
  • 凭据分发: 如果客户端获得授权,Foundry 的目录会"分发"(即颁发)短期有效、作用域狭窄的存储凭据。每个分发的凭据仅对特定文件有效,且有效期有限。
  • 数据访问: 客户端使用这些临时凭据直接访问底层存储(如 S3 或 ABFS)以读取或写入数据。凭据过期后,访问权限将自动撤销。

认证故障排除

如果在连接到 Foundry Iceberg REST 目录时遇到问题,请按照以下步骤隔离问题。

使用 bearer 令牌验证连接

首先,确认您可以使用 API 令牌从本地 Jupyter notebook 连接到目录。这可以独立于您的 OAuth2 配置,验证基本的网络连接和目录访问。

from pyiceberg.catalog import load_rest
from getpass import getpass

token = getpass("Foundry 用户令牌:")

catalog = load_rest(
    "foundry",
    {
        "uri": "https://<your_foundry_url>/iceberg",
        "token": token,
    },
)

table = catalog.load_table("<table_rid>")
print(table.snapshots())
del token

<your_foundry_url> 替换为您的 Foundry 主机名,将 <table_rid> 替换为您的 Iceberg 表的资源标识符。如果此步骤成功,则说明目录可访问,且您的用户具有正确的权限。

验证 OAuth2 凭据

确认 bearer 令牌连接后,从本地 Jupyter notebook 测试您的 OAuth2 客户端凭据。这可以验证第三方应用程序、其作用域以及服务用户权限是否配置正确。如果此步骤不成功,请验证第三方应用程序及其作用域是否配置正确,以及服务账户是否已被授予访问相关 Compass 文件夹和 Iceberg 表的权限。

from pyiceberg.catalog import load_rest
from getpass import getpass

foundry_url = "https://<your_foundry_url>"
client_id = "<your_client_id>"

client_secret = getpass("OAuth2 客户端密钥:")

catalog = load_rest(
    name="foundry",
    conf={
        "uri": f"{foundry_url}/iceberg",
        "oauth2-server-uri": f"{foundry_url}/iceberg/v1/oauth/tokens",
        "credential": f"{client_id}:{client_secret}",
        "scope": "api:iceberg-read api:iceberg-write",
    },
)

# 列出命名空间以验证目录访问
print(catalog.list_namespaces())

# 加载特定表以验证表级权限。
# 如果您无法读取该表,请验证与第三方应用程序关联的服务账户是否具有访问权限。
table = catalog.load_table("<table_rid>")
print(table.snapshots())
del client_secret

<your_foundry_url><your_client_id><table_rid> 替换为您环境中的具体值。