跳转至

Pivot(数据透视(Pivot))

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.

Transform categories: Aggregate, Popular

Declared arguments

  • Aggregations: List of aggregations to perform on the dataset.
    List\>
  • Dataset: Dataset to perform aggregate on.
    Table
  • Group by columns: List of columns to group the dataset by when aggregating.
    List\>
  • Pivot by column: Column to pivot on.
    Column\
  • Pivot by values: List of unique values used to pivot and aliases for the output. Alias values are used to construct the output column name according to the prefix / suffix argument.
    List\, Literal\\>>
  • optional Prefix or suffix alias: If prefix, the output column name will be 'alias''aggregate', if suffix it will be 'aggregate'alias.
    Enum\

Type variable bounds: T accepts Boolean | Byte | Integer | Long | Short | String

Examples

Example 1: Base case

Argument values:

  • Aggregations: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Group by columns: [airline]
  • Pivot by column: airport
  • Pivot by values: [(JFK, new_york), (LHR, london)]
  • Prefix or suffix alias: null

Input:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

Output:

airline new_york_miles london_miles
foundry airways 1002345.0 7381838.0
new air 1.22779055E7 null

Example 2: Base case

Argument values:

  • Aggregations: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Group by columns: [airline]
  • Pivot by column: airport
  • Pivot by values: [(JFK, new_york), (LHR, london)]
  • Prefix or suffix alias: SUFFIX

Input:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

Output:

airline miles_new_york miles_london
foundry airways 1002345.0 7381838.0
new air 1.22779055E7 null


中文翻译


数据透视(Pivot)

支持:批处理(Batch)、快速处理(Faster)

对输入数据集按指定列分组后执行指定的聚合操作。必须提供用于透视的唯一值,以便在运行前确定输出模式(schema)。这有助于提高运行时的长期稳定性。

转换类别:聚合(Aggregate)、常用(Popular)

声明参数

  • 聚合操作(Aggregations): 对数据集执行的聚合操作列表。
    List\>
  • 数据集(Dataset): 执行聚合操作的数据集。
    Table
  • 分组列(Group by columns): 聚合时用于对数据集进行分组的列列表。
    List\>
  • 透视列(Pivot by column): 用于透视的列。
    Column\
  • 透视值(Pivot by values): 用于透视的唯一值列表及其输出别名。别名值用于根据前缀/后缀参数构建输出列名。
    List\, Literal\\>>
  • 可选 前缀或后缀别名(Prefix or suffix alias): 若选择前缀,输出列名格式为"别名""聚合操作";若选择后缀,则格式为"聚合操作"别名
    Enum\

类型变量边界: T 接受 Boolean | Byte | Integer | Long | Short | String

示例

示例 1:基础情况

参数值:

  • 聚合操作: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • 数据集: ri.foundry.main.dataset.a
  • 分组列: [airline]
  • 透视列: airport
  • 透视值: [(JFK, new_york), (LHR, london)]
  • 前缀或后缀别名: null

输入:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

输出:

airline new_york_miles london_miles
foundry airways 1002345.0 7381838.0
new air 1.22779055E7 null

示例 2:基础情况

参数值:

  • 聚合操作: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • 数据集: ri.foundry.main.dataset.a
  • 分组列: [airline]
  • 透视列: airport
  • 透视值: [(JFK, new_york), (LHR, london)]
  • 前缀或后缀别名: SUFFIX

输入:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

输出:

airline miles_new_york miles_london
foundry airways 1002345.0 7381838.0
new air 1.22779055E7 null