跳转至

Transactions(事务)

Foundry's Iceberg catalog extends standard Iceberg with all-or-nothing transaction semantics. When a job performs multiple writes through Foundry's build system, all writes either succeed together or are fully discarded, matching the existing guarantee for Foundry catalog datasets. This page explains how Foundry's Iceberg transaction semantics work, how this compares to default Iceberg behavior, and what it means for writing pipelines.

:::callout{theme="neutral"} The transaction guarantees described on this page apply only when running jobs through Foundry's build system. When writing to Foundry's Iceberg catalog from an external client, you will get standard Iceberg transaction behavior. See PyIceberg API documentation ↗ for details on how transactions work in that context. :::

Foundry Iceberg transaction semantics

When running a build in Foundry, Foundry provides all-or-nothing updates. Foundry automatically wraps all Iceberg table reads and writes in a single transaction. Users do not need to take any action to configure this; it is the default behavior for writing to Iceberg tables using Foundry's build system.

By comparison, standard Iceberg lacks multi-update transaction guarantees. Instead, it provides atomic updates: each update is applied individually. This is a deliberate design choice that enables optimistic concurrency, meaning multiple writers can work against the same table simultaneously. While this model works well for single-write transactions, it can pose correctness issues for pipelines that perform multiple writes in a single transaction, whether those updates are applied to one table or multiple. Foundry's all-or-nothing transaction model addresses this by coordinating commits across all writes in a job, so partial updates are never visible.

In particular, Foundry's transaction model provides the following guarantees:

  • All-or-nothing commits: All table updates in a job are committed together. If the job fails at any point, no partial writes are visible to downstream consumers or other jobs. This matches the behavior of catalog dataset transactions.
  • Repeatable reads: If a table is read more than once within a job, the same data is returned both times, even if the table was updated externally between reads. Your pipeline sees a stable, consistent view of its inputs for the duration of the job.
  • Jobs see their own writes: Writes made within a job are immediately visible to subsequent reads within the same job. They remain invisible to all other jobs and external readers until the transaction commits.
  • Multi-table snapshot isolation: Foundry captures a consistent snapshot of all input tables at the start of the transaction. This ensures that partial external updates to an input cannot be partially read mid-job, and that data provenance — which version of an input produced which version of an output — is recorded correctly.

Example: incremental join pipeline

To illustrate the difference with an example:

Assume you have a job that incrementally reads new rows from two tables called orders and customers, joins them, and appends results to two tables: order_summary and customer_metrics. One day the job fails mid-run, after processing the write to the first table (order_summary) but before writing to the second table (customer_metrics).

With standard Iceberg transaction semantics: the first write stands. The order_summary table would now reflect the new batch of data, while customer_metrics does not. The two outputs are in an inconsistent state. When the job is then retried, it re-reads the same input rows and writes them to order_summary again, duplicating data from the previous partial run.

With Foundry Iceberg transaction semantics: the first write is not committed. The order_summary table would not reflect the new batch of data. Both tables remain unchanged and in a consistent state. When the job is then retried, it writes the new data to both tables correctly without duplication.

Example partial transaction.

Job queuing and optimistic concurrency

Foundry's build system ensures that at most one job runs against a given output at any point in time. Jobs writing to the same output are queued and run sequentially rather than concurrently. This means that in practice, write conflicts are not possible for regular pipeline jobs but can arise between pipeline jobs and Iceberg maintenance jobs.

By comparison, standard Iceberg uses optimistic concurrency, where multiple writers can work against the same table simultaneously, with conflicts detected and resolved at commit time. Foundry's transaction model trades away optimistic concurrency for a stricter approach to updates and correctness. If a transaction does encounter a conflicting concurrent update, the commit fails and the build system retries the job. Correctness is always preserved, at the cost of recomputing the job in case of conflicting updates.

A common source of conflicts is maintenance tasks such as compaction, which currently run concurrently to regular pipeline jobs. If a compaction task updates a table while a job is running, the job's transaction will fail to commit and the build system will retry the job. For tables that undergo frequent compaction, this can cause occasional retries.

Iceberg snapshot types and Foundry dataset transactions

If you are familiar with Foundry datasets, the transaction types you know map onto Iceberg snapshot types as follows:

Iceberg snapshot type Foundry dataset transaction Description
Append snapshot APPEND Adds new data files; existing files are unchanged.
Overwrite snapshot UPDATE, SNAPSHOT Adds and removes data files, changing the logical set of records (including both partial and full overwrites).
Delete snapshot DELETE Removes data files or adds delete files to logically delete rows.
Replace snapshot (no equivalent) Rewrites data files without changing logical contents (e.g. compaction).

:::callout{theme="neutral," title="Why does a first write appear as an append snapshot?"} A common point of confusion: the first write to a new Iceberg table is recorded as an append snapshot, even when the intent is to fully replace the table contents. This is expected. When a table is empty, appending data and replacing data are logically equivalent, as there is nothing to overwrite. Iceberg records the operation as an "append" because files were only added and none were removed. The practical implication is that if you see an append snapshot in a table's history for what you expected to be a full load, check whether it was the table's first write. On subsequent full rewrites to a non-empty table, the operation will correctly appear as an overwrite snapshot. :::


中文翻译


事务

Foundry 的 Iceberg 目录通过全有或全无的事务语义扩展了标准 Iceberg。当作业通过 Foundry 的构建系统执行多次写入时,所有写入要么全部成功,要么完全丢弃,这与 Foundry 目录数据集现有的保证一致。本页将说明 Foundry 的 Iceberg 事务语义如何工作、与默认 Iceberg 行为的区别,以及对编写流水线的意义。

:::callout{theme="neutral"} 本页描述的事务保证仅适用于通过 Foundry 构建系统运行的作业。当从外部客户端写入 Foundry 的 Iceberg 目录时,您将获得标准 Iceberg 事务行为。有关事务在该上下文中的工作方式,请参阅 PyIceberg API 文档 ↗。 :::

Foundry Iceberg 事务语义

在 Foundry 中运行构建时,Foundry 提供全有或全无更新。Foundry 会自动将所有 Iceberg 表读取和写入封装在单个事务中。用户无需进行任何配置;这是使用 Foundry 构建系统写入 Iceberg 表的默认行为。

相比之下,标准 Iceberg 缺乏多更新事务保证。它提供的是原子更新:每次更新单独应用。这是有意为之的设计选择,旨在实现乐观并发,即多个写入者可以同时操作同一张表。虽然这种模型适用于单次写入事务,但对于在单个事务中执行多次写入的流水线(无论这些更新应用于一张表还是多张表),可能会带来正确性问题。Foundry 的全有或全无事务模型通过协调作业中所有写入的提交来解决这一问题,因此部分更新永远不会可见。

具体来说,Foundry 的事务模型提供以下保证:

  • 全有或全无提交:作业中的所有表更新一起提交。如果作业在任何时候失败,下游消费者或其他作业都不会看到部分写入。这与目录数据集事务的行为一致。
  • 可重复读取:如果在作业内多次读取同一张表,即使该表在两次读取之间被外部更新,每次返回的数据也相同。您的流水线在作业运行期间会看到稳定、一致的输入视图。
  • 作业可看到自身写入:作业内的写入会立即对同一作业内的后续读取可见。但在事务提交之前,这些写入对所有其他作业和外部读取者保持不可见。
  • 多表快照隔离:Foundry 在事务开始时捕获所有输入表的一致快照。这确保输入表的部分外部更新不会在作业中途被部分读取,并且数据血缘关系——即哪个版本的输入产生了哪个版本的输出——能够被正确记录。

示例:增量连接流水线

通过一个示例来说明差异:

假设您有一个作业,它增量读取两个表(orderscustomers)的新行,将它们连接,并将结果追加到两个表:order_summarycustomer_metrics。某天,该作业在运行中途失败,此时已处理完对第一个表(order_summary)的写入,但尚未写入第二个表(customer_metrics)。

使用标准 Iceberg 事务语义: 第一次写入会保留。order_summary 表会反映新批次的数据,而 customer_metrics 则不会。两个输出处于不一致状态。当作业随后重试时,它会重新读取相同的输入行并再次写入 order_summary,导致之前部分运行的数据重复。

使用 Foundry Iceberg 事务语义: 第一次写入不会提交。order_summary不会反映新批次的数据。两个表保持不变,处于一致状态。当作业随后重试时,它会正确地将新数据写入两个表,不会出现重复。

部分事务示例。

作业排队与乐观并发

Foundry 的构建系统确保在任何时间点,最多只有一个作业针对某个特定输出运行。写入同一输出的作业会排队并按顺序运行,而非并发运行。这意味着,对于常规流水线作业,实际上不会发生写入冲突,但流水线作业与 Iceberg 维护作业之间可能产生冲突。

相比之下,标准 Iceberg 使用乐观并发,多个写入者可以同时操作同一张表,冲突在提交时检测并解决。Foundry 的事务模型放弃了乐观并发,转而采用更严格的更新和正确性方法。如果事务确实遇到并发更新冲突,提交会失败,构建系统会重试该作业。正确性始终得到保证,代价是在发生冲突更新时需要重新计算作业。

冲突的一个常见来源是维护任务,例如压缩(compaction),这些任务目前与常规流水线作业并发运行。如果压缩任务在作业运行时更新了表,该作业的事务将无法提交,构建系统会重试该作业。对于频繁进行压缩的表,这可能会导致偶尔的重试。

Iceberg 快照类型与 Foundry 数据集事务

如果您熟悉 Foundry 数据集,那么您了解的事务类型与 Iceberg 快照类型的对应关系如下:

Iceberg 快照类型 Foundry 数据集事务 描述
追加快照 (Append snapshot) APPEND 添加新数据文件;现有文件不变。
覆盖快照 (Overwrite snapshot) UPDATE, SNAPSHOT 添加和删除数据文件,改变记录的逻辑集合(包括部分覆盖和完全覆盖)。
删除快照 (Delete snapshot) DELETE 删除数据文件或添加删除文件以逻辑删除行。
替换快照 (Replace snapshot) (无对应) 重写数据文件而不改变逻辑内容(例如压缩)。

:::callout{theme="neutral," title="为什么首次写入显示为追加快照?"} 一个常见的困惑点:对新的 Iceberg 表的首次写入会被记录为追加快照,即使意图是完全替换表内容。这是符合预期的。当表为空时,追加数据和替换数据在逻辑上是等价的,因为没有需要覆盖的内容。Iceberg 将该操作记录为"追加",因为只添加了文件而没有删除任何文件。实际影响是,如果您在表的历史记录中看到追加快照,而您预期的是全量加载,请检查它是否是表的首次写入。在后续对非空表进行全量重写时,该操作将正确显示为覆盖快照。 :::