Pivot（数据透视（Pivot））¶

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.

Transform categories: Aggregate, Popular

Declared arguments¶

Aggregations: List of aggregations to perform on the dataset.
List\>
Dataset: Dataset to perform aggregate on.
Table
Group by columns: List of columns to group the dataset by when aggregating.
List\>
Pivot by column: Column to pivot on.
Column\
Pivot by values: List of unique values used to pivot and aliases for the output. Alias values are used to construct the output column name according to the prefix / suffix argument.
List\, Literal\\>>
optional Prefix or suffix alias: If prefix, the output column name will be 'alias''aggregate', if suffix it will be 'aggregate'alias.
Enum\

Examples¶

Example 1: Base case¶

Argument values:

Aggregations: [
alias(
alias: miles,
expression:
mean(
expression: miles,
),
)]
Dataset: ri.foundry.main.dataset.a
Group by columns: [airline]
Pivot by column: airport
Pivot by values: [(JFK, new_york), (LHR, london)]
Prefix or suffix alias: null

Input:

airline	airport	miles
foundry airways	JFK	1002345
foundry airways	LHR	2221324
new air	SFO	21356673
new air	JFK	12323456
foundry airways	LHR	12542352
new air	JFK	12232355

Output:

airline	new_york_miles	london_miles
foundry airways	1002345.0	7381838.0
new air	1.22779055E7	null

Example 2: Base case¶

Argument values:

Aggregations: [
alias(
alias: miles,
expression:
mean(
expression: miles,
),
)]
Dataset: ri.foundry.main.dataset.a
Group by columns: [airline]
Pivot by column: airport
Pivot by values: [(JFK, new_york), (LHR, london)]
Prefix or suffix alias: SUFFIX

Input:

airline	airport	miles
foundry airways	JFK	1002345
foundry airways	LHR	2221324
new air	SFO	21356673
new air	JFK	12323456
foundry airways	LHR	12542352
new air	JFK	12232355

Output:

airline	miles_new_york	miles_london
foundry airways	1002345.0	7381838.0
new air	1.22779055E7	null

中文翻译¶

数据透视（Pivot）¶

支持：批处理（Batch）、快速处理（Faster）

对输入数据集按指定列分组后执行指定的聚合操作。必须提供用于透视的唯一值，以便在运行前确定输出模式（schema）。这有助于提高运行时的长期稳定性。

转换类别：聚合（Aggregate）、常用（Popular）

声明参数¶

聚合操作（Aggregations）： 对数据集执行的聚合操作列表。
List\>
数据集（Dataset）： 执行聚合操作的数据集。
Table
分组列（Group by columns）： 聚合时用于对数据集进行分组的列列表。
List\>
透视列（Pivot by column）： 用于透视的列。
Column\
透视值（Pivot by values）： 用于透视的唯一值列表及其输出别名。别名值用于根据前缀/后缀参数构建输出列名。
List\, Literal\\>>
可选 前缀或后缀别名（Prefix or suffix alias）： 若选择前缀，输出列名格式为"别名""聚合操作"；若选择后缀，则格式为"聚合操作"别名。
Enum\

示例¶

示例 1：基础情况¶

参数值：

聚合操作： [
alias(
alias: miles,
expression:
mean(
expression: miles,
),
)]
数据集： ri.foundry.main.dataset.a
分组列： [airline]
透视列： airport
透视值： [(JFK, new_york), (LHR, london)]
前缀或后缀别名： null

输入：

airline	airport	miles
foundry airways	JFK	1002345
foundry airways	LHR	2221324
new air	SFO	21356673
new air	JFK	12323456
foundry airways	LHR	12542352
new air	JFK	12232355

输出：

airline	new_york_miles	london_miles
foundry airways	1002345.0	7381838.0
new air	1.22779055E7	null

示例 2：基础情况¶

参数值：

聚合操作： [
alias(
alias: miles,
expression:
mean(
expression: miles,
),
)]
数据集： ri.foundry.main.dataset.a
分组列： [airline]
透视列： airport
透视值： [(JFK, new_york), (LHR, london)]
前缀或后缀别名： SUFFIX

输入：

airline	airport	miles
foundry airways	JFK	1002345
foundry airways	LHR	2221324
new air	SFO	21356673
new air	JFK	12323456
foundry airways	LHR	12542352
new air	JFK	12232355

输出：

airline	miles_new_york	miles_london
foundry airways	1002345.0	7381838.0
new air	1.22779055E7	null