Drop duplicates(删除重复行(Drop duplicates))¶
Supported in: Batch, Faster
Drops duplicate rows from the input.
Transform categories: Other
Declared arguments¶
- Dataset: Dataset to deduplicate rows.
Table - optional Column subset: If any columns are specified only those will be used when determining uniqueness.
Set\>
Examples¶
Example 1: Base case¶
Argument values:
- Dataset: ri.foundry.main.dataset.aggregate
- Column subset: {
tail_number}
Input:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| XB-123 | foundry airline | 335 | 5 |
| MT-222 | new air | 565 | 4 |
| KK-452 | new air | 222 | 1 |
| XB-123 | foundry airline | 1134 | 3 |
Output:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| KK-452 | new air | 222 | 1 |
Example 2: Base case¶
Description: No subset looks for exact duplicates.
Argument values:
- Dataset: ri.foundry.main.dataset.aggregate
- Column subset: {}
Input:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| XB-123 | foundry air | 124 | 2 |
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| MT-222 | new airline | 1123 | 5 |
Output:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
中文翻译¶
删除重复行(Drop duplicates)¶
支持模式:批处理(Batch)、快速(Faster)
从输入数据中删除重复行。
转换类别:其他
声明参数¶
- 数据集(Dataset): 需要去重的数据集。
表格 - 可选 列子集(Column subset): 若指定了列,则仅使用这些列来判断唯一性。
Set\>
示例¶
示例 1:基础案例¶
参数值:
- 数据集: ri.foundry.main.dataset.aggregate
- 列子集: {
tail_number}
输入:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| XB-123 | foundry airline | 335 | 5 |
| MT-222 | new air | 565 | 4 |
| KK-452 | new air | 222 | 1 |
| XB-123 | foundry airline | 1134 | 3 |
输出:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| KK-452 | new air | 222 | 1 |
示例 2:基础案例¶
说明: 未指定子集时,将查找完全相同的重复行。
参数值:
- 数据集: ri.foundry.main.dataset.aggregate
- 列子集: {}
输入:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| XB-123 | foundry air | 124 | 2 |
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |
| MT-222 | new airline | 1123 | 5 |
输出:
| tail_number | airline | miles | factor |
|---|---|---|---|
| XB-123 | foundry air | 124 | 2 |
| MT-222 | new airline | 1123 | 5 |