Apply to multiple columns(应用于多列)¶
Supported in: Batch, Faster, Streaming
Transforms input dataset either by selecting columns or applying functions to columns.
Transform categories: Popular
Declared arguments¶
- Condition for columns to project: All columns in the input schema will be tested to see if they match this condition. If they match, the given expression will be applied to them.
ColumnPredicate - Dataset: Dataset to apply operations to.
Table - Expression to apply: The expression to apply once per each column that matches condition.
Expression\ - Keep remaining columns: Keeps all columns not projected in the dataset.
Literal\ - optional Keep matched columns: Keep the original columns that were matched by the condition. If a projected column has the same name, the original column will be overridden.
Literal\
Examples¶
Example 1: Base case¶
Description: Rename matched columns based on regex.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameRegexReplace(
input:column,
pattern: str,
replace: int,
),
) - Keep remaining columns: true
- Keep matched columns: false
Input:
| id | distance_str | factor_str |
|---|---|---|
| 1 | 2000 | 1265 |
Output:
| distance_int | factor_int | id |
|---|---|---|
| 2000 | 1265 | 1 |
Example 2: Edge case¶
Description: You can choose to keep both matched and remaining columns.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - Keep remaining columns: true
- Keep matched columns: true
Input:
| id | distance |
|---|---|
| 1 | 2000 |
Output:
| distance_as_integer | id | distance |
|---|---|---|
| 2000 | 1 | 2000 |
Example 3: Edge case¶
Description: You can choose to keep the columns that the condition matches, in addition to the new columns that are created.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - Keep remaining columns: false
- Keep matched columns: true
Input:
| id | distance |
|---|---|
| 1 | 2000 |
Output:
| distance_as_integer | distance |
|---|---|
| 2000 | 2000 |
Example 4: Edge case¶
Description: When keeping matching columns but the projected column overrides the existing column, then the matched column isn't kept. In order to keep the original column, you must rename the projected column to a new name.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
cast(
expression:column,
type: Integer,
) - Keep remaining columns: false
- Keep matched columns: true
Input:
| id | distance |
|---|---|
| 1 | 2000 |
Output:
| distance |
|---|
| 2000 |
Example 5: Edge case¶
Description: You can choose to keep only the columns that are projected.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - Keep remaining columns: false
- Keep matched columns: false
Input:
| id | distance |
|---|---|
| 1 | 2000 |
Output:
| distance_as_integer |
|---|
| 2000 |
Example 6: Edge case¶
Description: You can choose to keep only remaining columns that did not match the condition.
Argument values:
- Condition for columns to project:
columnHasType(
type: String,
) - Dataset: ri.foundry.main.dataset.a
- Expression to apply:
cast(
expression:column,
type: Integer,
) - Keep remaining columns: true
- Keep matched columns: false
Input:
| id | distance |
|---|---|
| 1 | 2000 |
Output:
| distance | id |
|---|---|
| 2000 | 1 |
中文翻译¶
应用于多列¶
支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)
通过选择列或对列应用函数来转换输入数据集。
转换类别:常用
声明的参数¶
- 列投影条件: 输入模式中的所有列都将被测试,看是否匹配此条件。如果匹配,则对它们应用给定的表达式。
ColumnPredicate - 数据集: 要对其应用操作的数据集。
Table - 要应用的表达式: 对每个匹配条件的列应用一次的表达式。
Expression\ - 保留其余列: 保留数据集中未投影的所有列。
Literal\ - 可选 保留匹配的列: 保留条件匹配的原始列。如果投影列具有相同名称,则原始列将被覆盖。
Literal\
示例¶
示例 1:基本情况¶
描述: 基于正则表达式重命名匹配的列。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameRegexReplace(
input:column,
pattern: str,
replace: int,
),
) - 保留其余列: true
- 保留匹配的列: false
输入:
| id | distance_str | factor_str |
|---|---|---|
| 1 | 2000 | 1265 |
输出:
| distance_int | factor_int | id |
|---|---|---|
| 2000 | 1265 | 1 |
示例 2:边界情况¶
描述: 您可以选择同时保留匹配的列和其余列。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - 保留其余列: true
- 保留匹配的列: true
输入:
| id | distance |
|---|---|
| 1 | 2000 |
输出:
| distance_as_integer | id | distance |
|---|---|---|
| 2000 | 1 | 2000 |
示例 3:边界情况¶
描述: 您可以选择保留条件匹配的列,以及新创建的列。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - 保留其余列: false
- 保留匹配的列: true
输入:
| id | distance |
|---|---|
| 1 | 2000 |
输出:
| distance_as_integer | distance |
|---|---|
| 2000 | 2000 |
示例 4:边界情况¶
描述: 当保留匹配列但投影列覆盖现有列时,匹配列不会被保留。为了保留原始列,您必须将投影列重命名为新名称。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
cast(
expression:column,
type: Integer,
) - 保留其余列: false
- 保留匹配的列: true
输入:
| id | distance |
|---|---|
| 1 | 2000 |
输出:
| distance |
|---|
| 2000 |
示例 5:边界情况¶
描述: 您可以选择只保留被投影的列。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
dynamicAlias(
expression:
cast(
expression:column,
type: Integer,
),
transformer:
columnNameConcat(
inputs: [column,_as_integer],
),
) - 保留其余列: false
- 保留匹配的列: false
输入:
| id | distance |
|---|---|
| 1 | 2000 |
输出:
| distance_as_integer |
|---|
| 2000 |
示例 6:边界情况¶
描述: 您可以选择只保留未匹配条件的其余列。
参数值:
- 列投影条件:
columnHasType(
type: String,
) - 数据集: ri.foundry.main.dataset.a
- 要应用的表达式:
cast(
expression:column,
type: Integer,
) - 保留其余列: true
- 保留匹配的列: false
输入:
| id | distance |
|---|---|
| 1 | 2000 |
输出:
| distance | id |
|---|---|
| 2000 | 1 |