Checks reference（检查参考）¶

This page provides more detailed documentation on available health check types.

Category	Check type	Supported resources
Status	Schedule status	Datasets
Status	Build status	Datasets, Iceberg tables, Virtual tables
Status	Job status	Datasets, Iceberg tables, Virtual tables
Status	Sync status	Datasets
Time	Build duration	Datasets
Time	Data freshness	Datasets
Time	Sync duration	Datasets
Time	Sync freshness	Datasets
Time	Time since last updated	Datasets, Iceberg tables, Virtual tables
Time	Time since sync last updated	Datasets
Size	Dataset file count	Datasets
Size	Dataset partition	Datasets
Size	Row count	Datasets
Size	Transaction file count	Datasets
Size	Transaction file size	Datasets
Content	Allowed column values	Datasets
Content	Approximate unique percentage	Datasets
Content	Column regex	Datasets
Content	Approximate column relation	Datasets
Content	Date range	Datasets
Content	Null percentage	Datasets
Content	Numeric mean	Datasets
Content	Numeric median	Datasets
Content	Numeric range	Datasets
Content	Primary key	Datasets, Iceberg tables, Virtual tables
Schema	Column	Datasets, Iceberg tables, Virtual tables
Schema	Column count	Datasets
Schema	Schema	Datasets, Iceberg tables, Virtual tables

Status checks¶

Schedule status¶

Checks whether the most recent build of the schedule succeeded or failed.

Rule component	Description	Example options	Required?
Severity	Severity of check failure	Moderate, Critical	Y
Escalate	Whether to escalate severity after consecutive failures	Y, N	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

A schedule status check is representative of the status of the pipeline or set of datasets that always build together. As a result, it will give a status across the various steps leading to the creation or update of this final dataset.

Build status¶

Checks whether the most recent build of the dataset succeeded or failed.

Rule component	Description	Example options	Required?
Severity	Severity of check failure	Moderate, Critical	Y
Escalate	Whether to escalate severity after consecutive failures	Y, N	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

A build status check is representative of the status of the whole process leading to a final dataset to be built. As a result, it will give a status across the various steps leading to the creation or update of this final dataset. Note that if the intermediate datasets that are updated or created during the process also have a build status health check, these will not be updated. However, the job status will be updated for all these intermediate datasets.

Job status¶

Checks whether the most recent job run on a dataset succeeded or failed.

Rule component	Description	Example options	Required?
Severity	Severity of check failure	Moderate, Critical	Y
Escalate	Whether to escalate severity after consecutive failures	Y, N	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

A job status check triggers independently from the build that causes the dataset to be refreshed or created. In other words, should the concerned dataset be the ultimate output of a given build or not, the job status check will run for each and every build of a particular dataset.

When to use job status, build status, or schedule status checks¶

In general it is recommended that all schedules have schedule status checks. If your schedule already has a schedule status check, installing job status checks on other datasets built by the same schedule is not recommended, as any job failing on the schedule will trigger a schedule status check.

Use a job status check with intermediate datasets if you want to check whether the dataset got updated, regardless of whether other datasets in the build were successfully updated. If needed, use a build status check if the dataset is a build output and you want to check that the entire build and all datasets, including this dataset, succeeded.

Build status and job status will be equivalent if the dataset is the only output of a build. They may differ if the dataset is an intermediate dataset or if the build has multiple outputs, and the job on the dataset succeeds (or does not run), but other jobs in the build fail and cause the build to fail.

Sync status¶

Checks whether the most recent sync of the dataset to another database succeeded or failed.

Rule component	Description	Example options	Required?
Sync destination	Which sync of the dataset to monitor, relevant especially when the dataset syncs to multiple destinations.	`phonograph2-cache-worker`, `jdbc-worker`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Escalate	Whether to escalate severity after consecutive failures	Y, N	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Time checks¶

Build duration¶

Checks whether the total time a build takes to complete meets some threshold.

Rule component	Description	Example options	Required?
Build duration	Total time a build takes to complete (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Median deviation	Difference (in approximate standard deviations) from the median time to complete recent builds	`1` Standard deviations, `10` Recent builds	N
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

As for the build status check, the build duration check will only be updated for the terminal output of the build. The intermediate datasets that are part of a larger build and have a build duration check attached to them will not be updated.

Data freshness¶

Checks the time of the latest transaction on a dataset against the maximum value of a timestamp column. If the timestamp in the column represents when the row was added, this can be used to measure exact data freshness.

Rule component	Description	Example options	Required?
Column name	Column name of the column containing the time of the last update.	`LAST_UPDATED`	Y
Freshness range	Time range during which to consider the column's latest data as "fresh" (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Sync duration¶

Checks whether the total time a sync takes to complete meets some threshold.

Rule component	Description	Example options	Required?
Sync destination	Which sync of the dataset to monitor, relevant especially when the dataset syncs to multiple destinations.	`phonograph2-cache-worker`, `jdbc-worker`	Y
Sync duration	Total time a sync takes to complete (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Median deviation	Difference (in approximate standard deviations) from the median time to complete recent syncs	`1` Standard deviations, `10` Recent builds	N
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Sync freshness¶

Checks the time of the latest sync of a dataset against the maximum value of a datetime column. If the timestamp in the column represents when the row was added, this can be used to measure exact data freshness.

Rule component	Description	Example options	Required?
Column name	Column name of the column containing the time of the last update.	`LAST_UPDATED`	Y
Freshness range	Time range during which to consider the column's latest data as "fresh" (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Time since last updated¶

Checks whether the total time since the dataset has updated (had a new transaction) meets some threshold.

Rule component	Description	Example options	Required?
Last updated	Total time since the dataset has updated (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Median deviation	Difference (in approximate standard deviations) from the median update time of recent builds	`1` Standard deviations, `10` Recent builds	N
Ignore empty transactions	Whether to exclude empty transactions when checking time since updated/median deviation. Transactions with no files will be ignored, as if they had not existed	Y, N	Y
Severity	Severity of check failure	Moderate, Critical	Y
Schedule	Schedule check to run automatically or manually	Automatic, Custom Schedule	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Time since sync last updated¶

Checks whether the total time since the dataset last synced to some destination meets some threshold.

Rule component	Description	Example options	Required?
Last sync	Total time since the dataset last synced to some destination (in days, minutes, or hours)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Median deviation	Difference (in approximate standard deviations) from the median update time of recent builds	`1` Standard deviations, `10` Recent builds	N
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Size checks¶

Dataset file count¶

Checks the total number of files in the latest view of the dataset.

Rule component	Description	Example options	Required?
File count	Total number of files in the most recent view of a dataset	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Median deviation	Difference (in approximate standard deviations) from the median number of files in recent builds	`1` Standard deviations, `10` Recent builds	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Dataset partition¶

Checks if the partitioning of the dataset is performant.

Rule component	Description	Example options	Required?
Notes	The partitioning check works as follows: - If there are less than 50 files in total, the check always passes. - If there are 50 or more files in total, the check passes if at least 90% of the files are more than 96MB in size. If the check fails, it means that the partitioning of the data across files is sub-optimal for performance and the data needs to be partitioned better.	No options to configure	N
Issues	Automatically create an issue when this check fails	Y, N	N

Row count¶

Checks the total number of rows in the dataset.

Rule component	Description	Example options	Required?
Row count	Total number of rows in a dataset	Between `500` and `1000`, Greater than or equal to `100`, Less than or equal to `1000`, Equal to `10`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Median deviation	Difference (in approximate standard deviations) from the median row count in recent builds	`1` Standard deviations, `10` Recent builds	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

If the row count check is set against the last successful check result, the check will evaluate the criteria according to the row count recorded in the previous passing check, and will not consider the results in failed checks.

Transaction file count¶

Checks the total number of files committed in one transaction, excluding log files.

Rule component	Description	Example options	Required?
File size	Total number of files committed in a transaction	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Severity	Severity of check failure	Moderate, Critical	Y
Median deviation	Difference (in approximate standard deviations) from the median number of files in recent builds	`1` Standard deviations, `10` Recent builds	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Transaction file size¶

Checks the total size of the files committed in one transaction, excluding log files.

Rule component	Description	Example options	Required?
File size	Total size of all files committed in a transaction (in `MB` or `KB`)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Severity	Severity of check failure	Moderate, Critical	Y
Median deviation	Difference (in approximate standard deviations) from the median file size in recent builds	`1` Standard deviations, `10` Recent builds	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Content checks¶

Allowed column values¶

Checks if the values in a column match a list of allowed values.

Rule component	Description	Example options	Required?
Column name	Column name to check against	`FIRST_NAME`	Y
Allowed values	Allowed possible values for above column	`John`, `Jane`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Approximate unique percentage¶

Checks what percentage of values in a column are unique. The percentage is approximate. Note this means this check is not suitable for checking if a column is a primary key (100% unique values), use the primary key check instead.

Rule component	Description	Example options	Required?
Column name	Column name to check against	`FIRST_NAME`	Y
Unique percentage	Values that are unique in the column (in `%`)	Between `10` and `20`, Greater than or equal to `50`, Less than or equal to `50`, Equal to `1`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Column regex¶

Checks if the values in a column match a certain regular expression.

Rule component	Description	Example options	Required?
Column name	Column name to check	`FIRST_NAME`	Y
Regex	Regular expression the column should match	`^Pre`, `post$`, `.any.`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Approximate column relation¶

This check provides an estimate of similarity between two columns as a percentage. For an exact check, use data expectations instead.

Rule component	Description	Example options	Required?
Other dataset	Dataset to check against	`/Users/John Appleseed/Stock_Prices_Latest`	Y
Column 1 name	Column name of the dataset on which the check is set	`FIRST_NAME`	Y
Column 2 name	Column name of the other dataset	`f_name`	Y
Percentage match	To what extent the two columns must match (in `%`)	`85%` of values are equal	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Date range¶

Checks for the range of values in a date column.

Rule component	Description	Example options	Required?
Column name	Name of the column to check	`LAST_UPDATED`	Y
Allowed date range	Allowed date range for the column	`2017-01-01 – 2018-01-01`	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Null percentage¶

Checks what percentage of values in a column are null.

Rule component	Description	Example options	Required?
Column name	Name of the column to check	`CUSTOMER_ID`	Y
Null percentage	Percentage of values that are null in the column (in `%`)	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Severity	Severity of check failure	Moderate, Critical	Y
Median deviation	Difference (in approximate standard deviations) from the median null percentage of recent builds	`1` Standard deviations, `10` Recent builds	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Numeric mean¶

Checks whether the average of a numeric column meets some threshold.

Rule component	Description	Example options	Required?
Column name	Name of the numeric column to check	`NUM_FAILURES`	Y
Mean	Desired mean of the column	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Severity	Severity of check failure	Moderate, Critical	Y
Difference from last check	Compare the current mean of the column to the mean of the column at the last check run, ± an optional constant	Greater than the last check + `5`	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Numeric median¶

Checks whether the median of a numeric column meets some threshold.

Rule component	Description	Example options	Required?
Column name	Name of the numeric column to check	`NUM_FAILURES`	Y
Median	Desired median of the column	Between `1` and `2`, Greater than or equal to `1`, Less than or equal to `1`, Equal to `1`	N
Severity	Severity of check failure	Moderate, Critical	Y
Difference from last check	Compare the current mean of the column to the mean of the column at the last check run, ± an optional constant	Greater than the last check + `5`	N
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Numeric range¶

Checks the range of values in a numeric column.

Rule component	Description	Example options	Required?
Column name	Name of the numeric column to check	`NUM_FAILURES`	Y
Allowed range	Allowed range for the column	`3-5`	Y
Severity	Severity of check failure	Moderate, Critical	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Primary key¶

Checks that the values in a column are 100% unique and non-null.

Rule component	Description	Example options	Required?
Column name	Name of the column to check	`PART_ID`	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Schema checks¶

Column¶

Checks for the existence and type of a column.

Rule component	Description	Example options	Required?
Column name	Name of the column to check for	`PART_ID`	Y
Is Present	Check existence of column	Y	Y
Type	Type of the column	`Integer`	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Column count¶

Checks for the total number of columns in the dataset.

Rule component	Description	Example options	Required?
Column count	Total number of columns in the dataset	`50`	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Schema¶

Checks the dataset schema, verifying that the schema is respecting the chosen comparison type (see below for more details on the available ones).

Rule component	Description	Example options	Required?
Columns	Enumerating the dataset columns and types - can choose full type match or column existence only	Type: String	Y
Comparison type	Specify which comparison policy will be used	Text	Y
Notes	Add a note to provide additional context	Text	N
Issues	Automatically create an issue when this check fails	Y, N	N

Available schema check types are the following:

Value	Comparison allowance
`EXACT_MATCH_ORDERED_COLUMNS`	Checks column order, names and types, and number of columns.
`EXACT_MATCH_UNORDERED_COLUMNS`	Checks column names and types, and number of columns. Order does not matter.
`COLUMN_ADDITIONS_ALLOWED`	Checks column names and types. Extra columns are allowed, but columns cannot be missing.
`COLUMN_ADDITIONS_ALLOWED_STRICT`	Like `COLUMN_ADDITIONS_ALLOWED`; however, whenever a new column is added to the dataset, that column is added to the check. Added columns cannot be missing thereafter.

Approximate standard deviation¶

Since dataset builds can easily have outliers, we do not use the true standard deviation. Instead, we use the median absolute deviation (MAD) which is a more robust measure of variability.

The MAD is defined as the median of the absolute deviations from the median of the data. For values x_1, ..., x_n with median X this means MAD = median(|x_i - X|).

The median absolute deviation can be used to approximate standard deviation by multiplying with a constant.

Our calculation is σ = MAD * 1.4826.

For detailed information see Median Absolute Deviation - Wikipedia ↗.

中文翻译¶

检查参考¶

本页面提供了关于可用健康检查类型的更详细文档。

类别	检查类型	支持的资源
状态	计划状态	数据集
状态	构建状态	数据集、Iceberg 表、虚拟表
状态	作业状态	数据集、Iceberg 表、虚拟表
状态	同步状态	数据集
时间	构建时长	数据集
时间	数据新鲜度	数据集
时间	同步时长	数据集
时间	同步新鲜度	数据集
时间	上次更新以来时间	数据集、Iceberg 表、虚拟表
时间	上次同步以来时间	数据集
大小	数据集文件数	数据集
大小	数据集分区	数据集
大小	行数	数据集
大小	事务文件数	数据集
大小	事务文件大小	数据集
内容	允许的列值	数据集
内容	近似唯一值百分比	数据集
内容	列正则表达式	数据集
内容	近似列关系	数据集
内容	日期范围	数据集
内容	空值百分比	数据集
内容	数值平均值	数据集
内容	数值中位数	数据集
内容	数值范围	数据集
内容	主键	数据集、Iceberg 表、虚拟表
模式	列	数据集、Iceberg 表、虚拟表
模式	列数	数据集
模式	模式	数据集、Iceberg 表、虚拟表

状态检查¶

计划状态¶

检查计划的最新构建是成功还是失败。

规则组件	描述	示例选项	是否必填？
严重级别	检查失败的严重级别	中等, 严重	是
升级	连续失败后是否升级严重级别	是, 否	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

计划状态检查代表了始终一起构建的管道或数据集集合的状态。因此，它会给出导致此最终数据集创建或更新的各个步骤的状态。

构建状态¶

检查数据集的最新构建是成功还是失败。

规则组件	描述	示例选项	是否必填？
严重级别	检查失败的严重级别	中等, 严重	是
升级	连续失败后是否升级严重级别	是, 否	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

构建状态检查代表了导致最终数据集构建的整个过程的状态。因此，它会给出导致此最终数据集创建或更新的各个步骤的状态。请注意，如果在此过程中更新或创建的中间数据集也附加了构建状态健康检查，这些检查将不会被更新。但是，所有这些中间数据集的作业状态将会被更新。

作业状态¶

检查数据集上最近一次作业运行是成功还是失败。

规则组件	描述	示例选项	是否必填？
严重级别	检查失败的严重级别	中等, 严重	是
升级	连续失败后是否升级严重级别	是, 否	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

作业状态检查独立于导致数据集刷新或创建的构建而触发。换句话说，无论相关数据集是否是给定构建的最终输出，作业状态检查都会针对特定数据集的每次构建运行。

何时使用作业状态、构建状态或计划状态检查¶

通常建议所有计划都设置计划状态检查。如果您的计划已有计划状态检查，则不建议在同一计划构建的其他数据集上安装作业状态检查，因为计划上任何作业失败都会触发计划状态检查。

如果您想检查数据集是否已更新，而不管构建中的其他数据集是否成功更新，请对中间数据集使用作业状态检查。如果需要，如果数据集是构建输出，并且您想检查整个构建以及所有数据集（包括此数据集）是否成功，请使用构建状态检查。

如果数据集是构建的唯一输出，则构建状态和作业状态将等效。如果数据集是中间数据集，或者构建有多个输出，并且数据集上的作业成功（或未运行），但构建中的其他作业失败并导致构建失败，则它们可能会有所不同。

同步状态¶

检查数据集到另一个数据库的最新同步是成功还是失败。

规则组件	描述	示例选项	是否必填？
同步目标	要监控的数据集的哪个同步，当数据集同步到多个目标时尤其相关。	`phonograph2-cache-worker`, `jdbc-worker`	是
严重级别	检查失败的严重级别	中等, 严重	是
升级	连续失败后是否升级严重级别	是, 否	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

时间检查¶

构建时长¶

检查构建完成所需的总时间是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
构建时长	构建完成所需的总时间（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
中位数偏差	与近期构建完成中位时间相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

与构建状态检查一样，构建时长检查只会针对构建的终端输出进行更新。作为更大构建一部分且附加了构建时长检查的中间数据集将不会被更新。

数据新鲜度¶

检查数据集上最新事务的时间与时间戳列的最大值。如果列中的时间戳表示行添加的时间，则可用于衡量精确的数据新鲜度。

规则组件	描述	示例选项	是否必填？
列名	包含上次更新时间的列的名称。	`LAST_UPDATED`	是
新鲜度范围	在此时间范围内认为列的最新数据是“新鲜的”（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	是
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

同步时长¶

检查同步完成所需的总时间是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
同步目标	要监控的数据集的哪个同步，当数据集同步到多个目标时尤其相关。	`phonograph2-cache-worker`, `jdbc-worker`	是
同步时长	同步完成所需的总时间（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
中位数偏差	与近期同步完成中位时间相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

同步新鲜度¶

检查数据集最新同步的时间与日期时间列的最大值。如果列中的时间戳表示行添加的时间，则可用于衡量精确的数据新鲜度。

规则组件	描述	示例选项	是否必填？
列名	包含上次更新时间的列的名称。	`LAST_UPDATED`	是
新鲜度范围	在此时间范围内认为列的最新数据是“新鲜的”（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	是
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

上次更新以来时间¶

检查自数据集更新（有新事务）以来的总时间是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
上次更新	自数据集更新以来的总时间（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
中位数偏差	与近期构建中位更新时间相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
忽略空事务	检查自上次更新/中位数偏差的时间时，是否排除空事务。没有文件的事务将被忽略，如同它们不存在一样。	是, 否	是
严重级别	检查失败的严重级别	中等, 严重	是
计划	计划检查自动运行还是手动运行	自动, 自定义计划	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

上次同步以来时间¶

检查自数据集上次同步到某个目标以来的总时间是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
上次同步	自数据集上次同步到某个目标以来的总时间（以天、分钟或小时为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
中位数偏差	与近期构建中位更新时间相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

大小检查¶

数据集文件数¶

检查数据集最新视图中的文件总数。

规则组件	描述	示例选项	是否必填？
文件数	数据集最近视图中的文件总数	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	是
严重级别	检查失败的严重级别	中等, 严重	是
中位数偏差	与近期构建中位文件数相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

数据集分区¶

检查数据集的分区是否高效。

规则组件	描述	示例选项	是否必填？
备注	分区检查的工作原理如下： - 如果文件总数少于 50 个，检查始终通过。 - 如果文件总数为 50 个或更多，则至少 90% 的文件大小超过 96MB 时检查通过。如果检查失败，意味着数据在文件间的分区对于性能而言不是最优的，需要更好地对数据进行分区。	无可配置选项	否
问题	当此检查失败时自动创建问题	是, 否	否

行数¶

检查数据集中的总行数。

规则组件	描述	示例选项	是否必填？
行数	数据集中的总行数	介于 `500` 和 `1000` 之间, 大于或等于 `100`, 小于或等于 `1000`, 等于 `10`	是
严重级别	检查失败的严重级别	中等, 严重	是
中位数偏差	与近期构建中位行数相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

如果行数检查设置为针对上次成功的检查结果，则检查将根据上次通过检查中记录的行数来评估标准，并且不会考虑失败检查中的结果。

事务文件数¶

检查一个事务中提交的文件总数，不包括日志文件。

规则组件	描述	示例选项	是否必填？
文件大小	一个事务中提交的文件总数	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
严重级别	检查失败的严重级别	中等, 严重	是
中位数偏差	与近期构建中位文件数相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

事务文件大小¶

检查一个事务中提交的文件总大小，不包括日志文件。

规则组件	描述	示例选项	是否必填？
文件大小	一个事务中提交的所有文件的总大小（以 `MB` 或 `KB` 为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
严重级别	检查失败的严重级别	中等, 严重	是
中位数偏差	与近期构建中位文件大小相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

内容检查¶

允许的列值¶

检查列中的值是否与允许值列表匹配。

规则组件	描述	示例选项	是否必填？
列名	要检查的列名	`FIRST_NAME`	是
允许的值	上述列允许的可能值	`John`, `Jane`	是
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

近似唯一值百分比¶

检查列中唯一值的百分比。该百分比是近似的。请注意，这意味着此检查不适合检查列是否为主键（100% 唯一值），请改用主键检查。

规则组件	描述	示例选项	是否必填？
列名	要检查的列名	`FIRST_NAME`	是
唯一值百分比	列中唯一的值（以 `%` 为单位）	介于 `10` 和 `20` 之间, 大于或等于 `50`, 小于或等于 `50`, 等于 `1`	是
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

列正则表达式¶

检查列中的值是否匹配某个正则表达式。

规则组件	描述	示例选项	是否必填？
列名	要检查的列名	`FIRST_NAME`	是
正则表达式	列应匹配的正则表达式	`^Pre`, `post$`, `.any.`	是
严重级别	检查失败的严重级别	中等, 严重	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

近似列关系¶

此检查提供两列之间相似性的估计百分比。如需精确检查，请改用数据期望。

规则组件	描述	示例选项	是否必填？
其他数据集	要对照检查的数据集	`/Users/John Appleseed/Stock_Prices_Latest`	是
列 1 名称	设置检查的数据集的列名	`FIRST_NAME`	是
列 2 名称	其他数据集的列名	`f_name`	是
匹配百分比	两列必须匹配的程度（以 `%` 为单位）	`85%` 的值相等	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

日期范围¶

检查日期列中值的范围。

规则组件	描述	示例选项	是否必填？
列名	要检查的列的名称	`LAST_UPDATED`	是
允许的日期范围	列允许的日期范围	`2017-01-01 – 2018-01-01`	是
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

空值百分比¶

检查列中值为空的百分比。

规则组件	描述	示例选项	是否必填？
列名	要检查的列的名称	`CUSTOMER_ID`	是
空值百分比	列中为空值的百分比（以 `%` 为单位）	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
严重级别	检查失败的严重级别	中等, 严重	是
中位数偏差	与近期构建中位空值百分比相比的差异（以近似标准差为单位）	`1` 个标准差, `10` 次近期构建	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

数值平均值¶

检查数值列的平均值是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
列名	要检查的数值列的名称	`NUM_FAILURES`	是
平均值	列的期望平均值	介于 `1` 和 `2` 之间, 大于或等于 `1`, 小于或等于 `1`, 等于 `1`	否
严重级别	检查失败的严重级别	中等, 严重	是
与上次检查的差异	将列的当前平均值与上次检查运行时的列平均值进行比较，± 一个可选常数	大于上次检查 + `5`	否
备注	添加备注以提供额外上下文	文本	否
问题	当此检查失败时自动创建问题	是, 否	否

数值中位数¶

检查数值列的中位数是否满足某个阈值。

规则组件	描述	示例选项	是否必填？
列名	要检查的数值列的名称	`NUM_FAILURES`	是
中位数	列的期望中位数