Aggregate on condition(条件聚合 (Aggregate on condition))¶
Supported in: Batch, Faster
Aggregate expressions based on a condition statement.
Transform categories: Aggregate, Popular
Declared arguments¶
- Condition for columns to aggregate on: All columns in the input schema will be tested to see if they match this condition. If they match, the given expressions will be applied to them.
ColumnPredicate - Dataset: Dataset to apply operations to.
Table - Expressions to aggregate: The aggregate expression to apply once per each column that matches condition.
List\> - optional Group by columns: List of columns to group the dataset by when aggregating. If empty, no group by is applied.
List\>
Examples¶
Example 1: Edge case¶
Description: Count non null rows for all columns.
Argument values:
- Condition for columns to aggregate on:
allColumns(
) - Dataset: ri.foundry.main.dataset.a
- Expressions to aggregate: [
dynamicAlias(
expression:
rowCount(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_non_null],
),
)] - Group by columns: null
Input:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 2 | null | 100 |
| 3 | 500 | 300 |
Output:
| id_non_null | value_non_null | distance_non_null |
|---|---|---|
| 3 | 2 | 3 |
Example 2: Edge case¶
Description: Count non null and mean of integer columns.
Argument values:
- Condition for columns to aggregate on:
columnHasType(
type: Integer,
) - Dataset: ri.foundry.main.dataset.a
- Expressions to aggregate: [
dynamicAlias(
expression:
rowCount(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_non_null],
),
),
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - Group by columns: null
Input:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 2 | null | 100 |
| 3 | 500 | 300 |
Output:
| id_non_null | id_mean | value_non_null | value_mean |
|---|---|---|---|
| 3 | 2.0 | 2 | 300.0 |
Example 3: Edge case¶
Argument values:
- Condition for columns to aggregate on:
columnHasType(
type: Integer,
) - Dataset: ri.foundry.main.dataset.a
- Expressions to aggregate: [
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - Group by columns: [
id]
Input:
| id | value | distance | airline |
|---|---|---|---|
| 1 | 100 | 2000 | new air |
| 1 | 200 | 3000 | new air |
| 2 | 500 | 3000 | foundry air |
| 2 | 400 | 1000 | foundry air |
Output:
| id | id_mean | value_mean | distance_mean |
|---|---|---|---|
| 1 | 1.0 | 150.0 | 2500.0 |
| 2 | 2.0 | 450.0 | 2000.0 |
Example 4: Edge case¶
Description: Mean of all integer columns.
Argument values:
- Condition for columns to aggregate on:
columnHasType(
type: Integer,
) - Dataset: ri.foundry.main.dataset.a
- Expressions to aggregate: [
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - Group by columns: null
Input:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 3 | 500 | 300 |
Output:
| id_mean | value_mean |
|---|---|
| 2.0 | 300.0 |
中文翻译¶
条件聚合 (Aggregate on condition)¶
支持:批处理 (Batch)、快速处理 (Faster)
基于条件语句的聚合表达式。
转换类别:聚合 (Aggregate)、常用 (Popular)
声明的参数 (Declared arguments)¶
- 待聚合列的条件 (Condition for columns to aggregate on): 输入模式中的所有列都将被测试是否匹配此条件。如果匹配,则对它们应用给定的表达式。
列谓词 (ColumnPredicate) - 数据集 (Dataset): 要应用操作的数据集。
表 (Table) - 聚合表达式 (Expressions to aggregate): 对每个匹配条件的列应用一次的聚合表达式。
列表\<表达式\<任意类型>> (List\>) - 可选 分组列 (Group by columns): 聚合时对数据集进行分组的列列表。如果为空,则不应用分组。
列表\<列\<任意类型>> (List\>)
示例 (Examples)¶
示例 1:边界情况 (Edge case)¶
描述: 统计所有列的非空行数。
参数值:
- 待聚合列的条件:
allColumns(
) - 数据集: ri.foundry.main.dataset.a
- 聚合表达式: [
dynamicAlias(
expression:
rowCount(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_non_null],
),
)] - 分组列: null
输入:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 2 | null | 100 |
| 3 | 500 | 300 |
输出:
| id_non_null | value_non_null | distance_non_null |
|---|---|---|
| 3 | 2 | 3 |
示例 2:边界情况 (Edge case)¶
描述: 统计整数列的非空行数和平均值。
参数值:
- 待聚合列的条件:
columnHasType(
type: Integer,
) - 数据集: ri.foundry.main.dataset.a
- 聚合表达式: [
dynamicAlias(
expression:
rowCount(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_non_null],
),
),
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - 分组列: null
输入:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 2 | null | 100 |
| 3 | 500 | 300 |
输出:
| id_non_null | id_mean | value_non_null | value_mean |
|---|---|---|---|
| 3 | 2.0 | 2 | 300.0 |
示例 3:边界情况 (Edge case)¶
参数值:
- 待聚合列的条件:
columnHasType(
type: Integer,
) - 数据集: ri.foundry.main.dataset.a
- 聚合表达式: [
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - 分组列: [
id]
输入:
| id | value | distance | airline |
|---|---|---|---|
| 1 | 100 | 2000 | new air |
| 1 | 200 | 3000 | new air |
| 2 | 500 | 3000 | foundry air |
| 2 | 400 | 1000 | foundry air |
输出:
| id | id_mean | value_mean | distance_mean |
|---|---|---|---|
| 1 | 1.0 | 150.0 | 2500.0 |
| 2 | 2.0 | 450.0 | 2000.0 |
示例 4:边界情况 (Edge case)¶
描述: 所有整数列的平均值。
参数值:
- 待聚合列的条件:
columnHasType(
type: Integer,
) - 数据集: ri.foundry.main.dataset.a
- 聚合表达式: [
dynamicAlias(
expression:
mean(
expression:column,
),
transformer:
columnNameConcat(
inputs: [column,_mean],
),
)] - 分组列: null
输入:
| id | value | distance |
|---|---|---|
| 1 | 100 | 2000 |
| 3 | 500 | 300 |
输出:
| id_mean | value_mean |
|---|---|
| 2.0 | 300.0 |