Creating Iceberg tables from a local notebook(从本地笔记本创建 Iceberg 表)¶
Iceberg's open table format allows you to read and write Foundry Iceberg tables using external engines.
The below code example uses PyIceberg ↗ to create a Foundry table from a Jupyter® notebook running on your computer. You can create a Foundry table with any external engine that supports Iceberg REST catalogs.
from pyiceberg.catalog import load_rest
from getpass import getpass
import pyarrow.parquet as pq
# Create catalog client to create, load, and explore Iceberg tables in Foundry
catalog = load_rest(
'foundry',
{
'uri': 'https://<your_foundry_url>/iceberg',
'token': getpass('Foundry token:')
}
)
# Read local Parquet file into Arrow table
df = pq.read_table('/<local_filepath>/example_data.parquet')
# Create a new Iceberg table in Foundry
table = catalog.create_table(
'Namespace.Project.Folder.example_data',
schema = df.schema
)
# List Iceberg tables - your new empty Foundry table will appear
catalog.list_tables('Namespace.Project.Folder.')
# Use `append` to insert the local PyArrow table into the Foundry Iceberg table
table.append(df)
# Use `scan()` to load the Iceberg table from Foundry - for example to read into a Pandas dataframe
table.scan().to_pandas()
:::callout{theme="neutral"}
Identifiers in PyIceberg and SQL more broadly are dot-separated. Foundry honors this convention in mapping Iceberg namespaces to Compass paths. For example, an Iceberg namespace identifier Namespace.Project.Dir.Table maps to a Compass path Namespace/Project/Dir/Table.
:::
Jupyter®, JupyterLab®, and the Jupyter® logos are trademarks or registered trademarks of NumFOCUS.
All third-party trademarks (including logos and icons) referenced remain the property of their respective owners. No affiliation or endorsement is implied.
中文翻译¶
从本地笔记本创建 Iceberg 表¶
Iceberg 开放表格式允许您使用外部引擎读写 Foundry Iceberg 表。
以下代码示例使用 PyIceberg ↗ 从您计算机上运行的 Jupyter® 笔记本创建一个 Foundry 表。您可以使用任何支持 Iceberg REST 目录的外部引擎创建 Foundry 表。
from pyiceberg.catalog import load_rest
from getpass import getpass
import pyarrow.parquet as pq
# 创建目录客户端,用于在 Foundry 中创建、加载和浏览 Iceberg 表
catalog = load_rest(
'foundry',
{
'uri': 'https://<your_foundry_url>/iceberg',
'token': getpass('Foundry token:')
}
)
# 将本地 Parquet 文件读取为 Arrow 表
df = pq.read_table('/<local_filepath>/example_data.parquet')
# 在 Foundry 中创建新的 Iceberg 表
table = catalog.create_table(
'Namespace.Project.Folder.example_data',
schema = df.schema
)
# 列出 Iceberg 表 - 您新建的空 Foundry 表将显示出来
catalog.list_tables('Namespace.Project.Folder.')
# 使用 `append` 将本地 PyArrow 表插入到 Foundry Iceberg 表中
table.append(df)
# 使用 `scan()` 从 Foundry 加载 Iceberg 表 - 例如读取到 Pandas 数据框
table.scan().to_pandas()
:::callout{theme="neutral"}
PyIceberg 及更广泛的 SQL 中的标识符采用点号分隔。Foundry 在将 Iceberg 命名空间映射到 Compass 路径时遵循此约定。例如,Iceberg 命名空间标识符 Namespace.Project.Dir.Table 映射到 Compass 路径 Namespace/Project/Dir/Table。
:::
Jupyter®、JupyterLab® 及 Jupyter® 标识均为 NumFOCUS 的商标或注册商标。
所有引用的第三方商标(包括标识和图标)均归其各自所有者所有。不暗示任何关联或认可。