跳转至

Keeps duplicates(保留重复行)

Supported in: Batch, Faster

Keep duplicate rows from the input.

Transform categories: Other

Declared arguments

  • Column subset: If any columns are specified only those will be used when determining uniqueness.
    Set\>
  • Dataset: Dataset to keep duplicate rows from.
    Table

Examples

Example 1: Base case

Argument values:

  • Column subset: {tail_number}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

Output:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
XB-123 foundry airline 1134 3

Example 2: Base case

Description: No subset looks for exact duplicates.

Argument values:

  • Column subset: {}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_number airline miles factor
XB-123 foundry air 124 2
XB-123 foundry air 124 2
XB-123 foundry air 124 2
MT-222 new airline 1123 6
MT-222 new airline 1123 5

Output:

tail_number airline miles factor
XB-123 foundry air 124 2
XB-123 foundry air 124 2
XB-123 foundry air 124 2

Example 3: Null case

Argument values:

  • Column subset: {tail_number}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_number airline miles factor
null foundry air 124 2
null new airline 1123 5
null foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

Output:

tail_number airline miles factor
null foundry air 124 2
null new airline 1123 5
null foundry airline 335 5


中文翻译

保留重复行

支持模式:批处理(Batch)、快速处理(Faster)

从输入数据中保留重复行。

转换类别:其他

声明参数

  • 列子集(Column subset): 如果指定了任何列,则仅使用这些列来判断唯一性。
    Set\>
  • 数据集(Dataset): 需要从中保留重复行的数据集。
    Table

示例

示例1:基础案例

参数值:

  • 列子集(Column subset): {tail_number}
  • 数据集(Dataset): ri.foundry.main.dataset.aggregate

输入:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

输出:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
XB-123 foundry airline 1134 3

示例2:基础案例

描述: 未指定子集时,查找完全重复的行。

参数值:

  • 列子集(Column subset): {}
  • 数据集(Dataset): ri.foundry.main.dataset.aggregate

输入:

tail_number airline miles factor
XB-123 foundry air 124 2
XB-123 foundry air 124 2
XB-123 foundry air 124 2
MT-222 new airline 1123 6
MT-222 new airline 1123 5

输出:

tail_number airline miles factor
XB-123 foundry air 124 2
XB-123 foundry air 124 2
XB-123 foundry air 124 2

示例3:空值案例

参数值:

  • 列子集(Column subset): {tail_number}
  • 数据集(Dataset): ri.foundry.main.dataset.aggregate

输入:

tail_number airline miles factor
null foundry air 124 2
null new airline 1123 5
null foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

输出:

tail_number airline miles factor
null foundry air 124 2
null new airline 1123 5
null foundry airline 335 5