跳转至

Creating Iceberg tables from a local notebook(从本地笔记本创建 Iceberg 表)

Iceberg's open table format allows you to read and write Foundry Iceberg tables using external engines.

The below code example uses PyIceberg ↗ to create a Foundry table from a Jupyter® notebook running on your computer. You can create a Foundry table with any external engine that supports Iceberg REST catalogs.

from pyiceberg.catalog import load_rest
from getpass import getpass
import pyarrow.parquet as pq

# Create catalog client to create, load, and explore Iceberg tables in Foundry
catalog = load_rest(
    'foundry',
    {
        'uri': 'https://<your_foundry_url>/iceberg',
        'token': getpass('Foundry token:')
    }
)

# Read local Parquet file into Arrow table
df = pq.read_table('/<local_filepath>/example_data.parquet')

# Create a new Iceberg table in Foundry
table = catalog.create_table(
    'Namespace.Project.Folder.example_data',
    schema = df.schema
)

# List Iceberg tables - your new empty Foundry table will appear
catalog.list_tables('Namespace.Project.Folder.')

# Use `append` to insert the local PyArrow table into the Foundry Iceberg table
table.append(df)

# Use `scan()` to load the Iceberg table from Foundry - for example to read into a Pandas dataframe
table.scan().to_pandas()

:::callout{theme="neutral"} Identifiers in PyIceberg and SQL more broadly are dot-separated. Foundry honors this convention in mapping Iceberg namespaces to Compass paths. For example, an Iceberg namespace identifier Namespace.Project.Dir.Table maps to a Compass path Namespace/Project/Dir/Table. :::


Jupyter®, JupyterLab®, and the Jupyter® logos are trademarks or registered trademarks of NumFOCUS.

All third-party trademarks (including logos and icons) referenced remain the property of their respective owners. No affiliation or endorsement is implied.


中文翻译


从本地笔记本创建 Iceberg 表

Iceberg 开放表格式允许您使用外部引擎读写 Foundry Iceberg 表。

以下代码示例使用 PyIceberg ↗ 从您计算机上运行的 Jupyter® 笔记本创建一个 Foundry 表。您可以使用任何支持 Iceberg REST 目录的外部引擎创建 Foundry 表。

from pyiceberg.catalog import load_rest
from getpass import getpass
import pyarrow.parquet as pq

# 创建目录客户端,用于在 Foundry 中创建、加载和浏览 Iceberg 表
catalog = load_rest(
    'foundry',
    {
        'uri': 'https://<your_foundry_url>/iceberg',
        'token': getpass('Foundry token:')
    }
)

# 将本地 Parquet 文件读取为 Arrow 表
df = pq.read_table('/<local_filepath>/example_data.parquet')

# 在 Foundry 中创建新的 Iceberg 表
table = catalog.create_table(
    'Namespace.Project.Folder.example_data',
    schema = df.schema
)

# 列出 Iceberg 表 - 您新建的空 Foundry 表将显示出来
catalog.list_tables('Namespace.Project.Folder.')

# 使用 `append` 将本地 PyArrow 表插入到 Foundry Iceberg 表中
table.append(df)

# 使用 `scan()` 从 Foundry 加载 Iceberg 表 - 例如读取到 Pandas 数据框
table.scan().to_pandas()

:::callout{theme="neutral"} PyIceberg 及更广泛的 SQL 中的标识符采用点号分隔。Foundry 在将 Iceberg 命名空间映射到 Compass 路径时遵循此约定。例如,Iceberg 命名空间标识符 Namespace.Project.Dir.Table 映射到 Compass 路径 Namespace/Project/Dir/Table。 :::


Jupyter®、JupyterLab® 及 Jupyter® 标识均为 NumFOCUS 的商标或注册商标。

所有引用的第三方商标(包括标识和图标)均归其各自所有者所有。不暗示任何关联或认可。