跳转至

Transaction selectors(事务选择器)

The following list describes the transaction selectors available for use when configuring retention policies in the Retention application.

Text in this format represents a parameter that could be defined on each policy.

Only in branch

Delete transactions that appear only in the given branch.

Takes 1 argument: branch

Example: "Only in branch main"

Only in branch selector

Setting "Only in branch main" will not delete the first two SNAPSHOT transactions as they are also the root transaction for the SNAPSHOT transaction on the feature branch.

Not in branch

Delete all transactions except ones in the given branch.

Takes 1 argument: branch

Example: "Not in branch main"

Not in branch selector

Transaction count

:::callout{theme="warning"} The transaction count selector allows you to define the transactions to retain. It is not to indicate the transactions that will be deleted. :::

Retains only the transactions that are among the number of transaction to retain most recent data-containing transactions data on any branch. A transaction is defined to be data-containing if, and only if, the following statements are true:

  • The transactions is committed.
  • The transaction is not a DELETE transaction.

All aborted transactions are not data-containing and will be deleted. As this selector does not differentiate between SNAPSHOT, APPEND, or UPDATE transactions, we recommend using the viewCount selector in incremental pipelines.

Takes 1 argument: number of transactions to retain

Example: "transaction count 2"

Transaction count selector

This selector ensures that at least 2 SNAPSHOT transactions are available on each branch. The feature branch could have 3 transactions; the oldest transaction is also the 2nd transaction on branch main, so it is not deleted.

View count

:::callout{theme="warning"} The view count selector allows you to define the transactions in views to retain. It is not to indicate the transactions in views that will be deleted. :::

The view count selector retains only transactions in the last number of views to retain dataset views. As a view is defined to comprise of only committed transactions, any aborted transactions are also deleted. For example, numViewsToRetain: 1 means that all transactions prior to the latest view (that is, all transactions prior to the latest SNAPSHOT transaction) and all aborted transactions are deleted.

Takes 1 argument: number of views to retain

Example: "view count 1"

View count selector

Older than a certain duration

This selects transactions older than the given duration.

Takes 1 argument: duration

Has been projected (advanced)

For datasets with projections, this selector selects the transactions that have been propagated to all projections.

No files in active view (advanced)

This selector selects all transactions that are not in the latest view (as well as all transactions currently in views where all files have been superseded by files in newer transactions). This is useful for datasets which have many transactions in the latest view, and should only be used with the Allow deletion from latest view flag.

Only present in views older than (advanced)

Selects transactions only present in views older than a given duration. A view age is defined as time between the close time of the latest transaction in the view and now. If a view has an open transaction, none of the transactions in that view will be deleted.

Takes 1 argument: duration

Is derived (advanced)

Selects transactions which are derived. These are transactions generated from running a build. Transactions created from manually uploading data or through Data Connection are not considered derived.


中文翻译

事务选择器

以下列表描述了在Retention应用中配置保留策略时可使用的事务选择器。

此格式中的文本表示可在每个策略上定义的参数。

仅存在于分支中

删除出现在指定分支中的事务。

接受1个参数: branch

示例: "仅存在于分支 main 中"

仅存在于分支选择器

设置"仅存在于分支 main 中"不会删除前两个SNAPSHOT事务,因为它们也是特性分支上SNAPSHOT事务的根事务。

不存在于分支中

删除除指定分支外的所有事务。

接受1个参数: branch

示例: "不存在于分支 main 中"

不存在于分支选择器

事务计数

:::callout{theme="warning"} 事务计数选择器用于定义要保留的事务,而非指示将被删除的事务。 :::

仅保留任意分支上最近要保留的事务数量个包含数据的事务。仅当以下陈述均为真时,事务才被定义为包含数据:

  • 事务已提交。
  • 该事务不是DELETE事务。

所有已中止的事务均不包含数据,将被删除。由于此选择器不区分SNAPSHOTAPPENDUPDATE事务,我们建议在增量管道中使用viewCount选择器。

接受1个参数: 要保留的事务数量

示例: "事务计数 2"

事务计数选择器

此选择器确保每个分支上至少有2SNAPSHOT事务可用。特性分支可能有3个事务;最旧的事务也是main分支上的第2个事务,因此不会被删除。

视图计数

:::callout{theme="warning"} 视图计数选择器用于定义要保留的视图中的事务,而非指示视图中将被删除的事务。 :::

视图计数选择器仅保留最近要保留的视图数量数据集视图中的事务。由于视图仅由已提交的事务组成,任何已中止的事务也会被删除。例如,numViewsToRetain: 1表示将删除最新视图之前的所有事务(即最新SNAPSHOT事务之前的所有事务)以及所有已中止的事务。

接受1个参数: 要保留的视图数量

示例: "视图计数 1"

视图计数选择器

早于特定时长

选择早于给定时长的事务。

接受1个参数: duration

已投影(高级)

对于具有投影的数据集,此选择器选择已传播到所有投影的事务。

活动视图中无文件(高级)

此选择器选择不在最新视图中的所有事务(以及当前视图中所有文件已被较新事务中的文件取代的事务)。这对于最新视图中包含许多事务的数据集非常有用,且应仅与"允许从最新视图中删除"标志一起使用。

仅存在于早于特定时长的视图中(高级)

选择仅存在于早于给定时长的视图中的事务。视图年龄定义为视图中最新事务的关闭时间与当前时间之间的间隔。如果视图包含未关闭的事务,则该视图中的任何事务都不会被删除。

接受1个参数: duration

派生事务(高级)

选择派生的事务。这些是通过运行构建生成的事务。通过手动上传数据或通过Data Connection创建的事务不被视为派生事务。