跳转至

Apply to multiple columns(应用于多列)

Supported in: Batch, Faster, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

Declared arguments

  • Condition for columns to project: All columns in the input schema will be tested to see if they match this condition. If they match, the given expression will be applied to them.
    ColumnPredicate
  • Dataset: Dataset to apply operations to.
    Table
  • Expression to apply: The expression to apply once per each column that matches condition.
    Expression\
  • Keep remaining columns: Keeps all columns not projected in the dataset.
    Literal\
  • optional Keep matched columns: Keep the original columns that were matched by the condition. If a projected column has the same name, the original column will be overridden.
    Literal\

Examples

Example 1: Base case

Description: Rename matched columns based on regex.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameRegexReplace(
     input: column,
     pattern: str,
     replace: int,
    ),
    )
  • Keep remaining columns: true
  • Keep matched columns: false

Input:

id distance_str factor_str
1 2000 1265

Output:

distance_int factor_int id
2000 1265 1

Example 2: Edge case

Description: You can choose to keep both matched and remaining columns.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: true
  • Keep matched columns: true

Input:

id distance
1 2000

Output:

distance_as_integer id distance
2000 1 2000

Example 3: Edge case

Description: You can choose to keep the columns that the condition matches, in addition to the new columns that are created.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: false
  • Keep matched columns: true

Input:

id distance
1 2000

Output:

distance_as_integer distance
2000 2000

Example 4: Edge case

Description: When keeping matching columns but the projected column overrides the existing column, then the matched column isn't kept. In order to keep the original column, you must rename the projected column to a new name.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    cast(
     expression: column,
     type: Integer,
    )
  • Keep remaining columns: false
  • Keep matched columns: true

Input:

id distance
1 2000

Output:

distance
2000

Example 5: Edge case

Description: You can choose to keep only the columns that are projected.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • Keep remaining columns: false
  • Keep matched columns: false

Input:

id distance
1 2000

Output:

distance_as_integer
2000

Example 6: Edge case

Description: You can choose to keep only remaining columns that did not match the condition.

Argument values:

  • Condition for columns to project:
    columnHasType(
     type: String,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expression to apply:
    cast(
     expression: column,
     type: Integer,
    )
  • Keep remaining columns: true
  • Keep matched columns: false

Input:

id distance
1 2000

Output:

distance id
2000 1


中文翻译

应用于多列

支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)

通过选择列或对列应用函数来转换输入数据集。

转换类别:常用

声明的参数

  • 列投影条件: 输入模式中的所有列都将被测试,看是否匹配此条件。如果匹配,则对它们应用给定的表达式。
    ColumnPredicate
  • 数据集: 要对其应用操作的数据集。
    Table
  • 要应用的表达式: 对每个匹配条件的列应用一次的表达式。
    Expression\
  • 保留其余列: 保留数据集中未投影的所有列。
    Literal\
  • 可选 保留匹配的列: 保留条件匹配的原始列。如果投影列具有相同名称,则原始列将被覆盖。
    Literal\

示例

示例 1:基本情况

描述: 基于正则表达式重命名匹配的列。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameRegexReplace(
     input: column,
     pattern: str,
     replace: int,
    ),
    )
  • 保留其余列: true
  • 保留匹配的列: false

输入:

id distance_str factor_str
1 2000 1265

输出:

distance_int factor_int id
2000 1265 1

示例 2:边界情况

描述: 您可以选择同时保留匹配的列和其余列。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • 保留其余列: true
  • 保留匹配的列: true

输入:

id distance
1 2000

输出:

distance_as_integer id distance
2000 1 2000

示例 3:边界情况

描述: 您可以选择保留条件匹配的列,以及新创建的列。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • 保留其余列: false
  • 保留匹配的列: true

输入:

id distance
1 2000

输出:

distance_as_integer distance
2000 2000

示例 4:边界情况

描述: 当保留匹配列但投影列覆盖现有列时,匹配列不会被保留。为了保留原始列,您必须将投影列重命名为新名称。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    cast(
     expression: column,
     type: Integer,
    )
  • 保留其余列: false
  • 保留匹配的列: true

输入:

id distance
1 2000

输出:

distance
2000

示例 5:边界情况

描述: 您可以选择只保留被投影的列。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    dynamicAlias(
     expression:
    cast(
     expression: column,
     type: Integer,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _as_integer],
    ),
    )
  • 保留其余列: false
  • 保留匹配的列: false

输入:

id distance
1 2000

输出:

distance_as_integer
2000

示例 6:边界情况

描述: 您可以选择只保留未匹配条件的其余列。

参数值:

  • 列投影条件:
    columnHasType(
     type: String,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 要应用的表达式:
    cast(
     expression: column,
     type: Integer,
    )
  • 保留其余列: true
  • 保留匹配的列: false

输入:

id distance
1 2000

输出:

distance id
2000 1