跳转至

Aggregate on condition(条件聚合 (Aggregate on condition))

Supported in: Batch, Faster

Aggregate expressions based on a condition statement.

Transform categories: Aggregate, Popular

Declared arguments

  • Condition for columns to aggregate on: All columns in the input schema will be tested to see if they match this condition. If they match, the given expressions will be applied to them.
    ColumnPredicate
  • Dataset: Dataset to apply operations to.
    Table
  • Expressions to aggregate: The aggregate expression to apply once per each column that matches condition.
    List\>
  • optional Group by columns: List of columns to group the dataset by when aggregating. If empty, no group by is applied.
    List\>

Examples

Example 1: Edge case

Description: Count non null rows for all columns.

Argument values:

  • Condition for columns to aggregate on:
    allColumns(

    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    )]
  • Group by columns: null

Input:

id value distance
1 100 2000
2 null 100
3 500 300

Output:

id_non_null value_non_null distance_non_null
3 2 3

Example 2: Edge case

Description: Count non null and mean of integer columns.

Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    ),
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: null

Input:

id value distance
1 100 2000
2 null 100
3 500 300

Output:

id_non_null id_mean value_non_null value_mean
3 2.0 2 300.0

Example 3: Edge case

Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: [id]

Input:

id value distance airline
1 100 2000 new air
1 200 3000 new air
2 500 3000 foundry air
2 400 1000 foundry air

Output:

id id_mean value_mean distance_mean
1 1.0 150.0 2500.0
2 2.0 450.0 2000.0

Example 4: Edge case

Description: Mean of all integer columns.

Argument values:

  • Condition for columns to aggregate on:
    columnHasType(
     type: Integer,
    )
  • Dataset: ri.foundry.main.dataset.a
  • Expressions to aggregate: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • Group by columns: null

Input:

id value distance
1 100 2000
3 500 300

Output:

id_mean value_mean
2.0 300.0


中文翻译

条件聚合 (Aggregate on condition)

支持:批处理 (Batch)、快速处理 (Faster)

基于条件语句的聚合表达式。

转换类别:聚合 (Aggregate)、常用 (Popular)

声明的参数 (Declared arguments)

  • 待聚合列的条件 (Condition for columns to aggregate on): 输入模式中的所有列都将被测试是否匹配此条件。如果匹配,则对它们应用给定的表达式。
    列谓词 (ColumnPredicate)
  • 数据集 (Dataset): 要应用操作的数据集。
    表 (Table)
  • 聚合表达式 (Expressions to aggregate): 对每个匹配条件的列应用一次的聚合表达式。
    列表\<表达式\<任意类型>> (List\>)
  • 可选 分组列 (Group by columns): 聚合时对数据集进行分组的列列表。如果为空,则不应用分组。
    列表\<列\<任意类型>> (List\>)

示例 (Examples)

示例 1:边界情况 (Edge case)

描述: 统计所有列的非空行数。

参数值:

  • 待聚合列的条件:
    allColumns(

    )
  • 数据集: ri.foundry.main.dataset.a
  • 聚合表达式: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    )]
  • 分组列: null

输入:

id value distance
1 100 2000
2 null 100
3 500 300

输出:

id_non_null value_non_null distance_non_null
3 2 3

示例 2:边界情况 (Edge case)

描述: 统计整数列的非空行数和平均值。

参数值:

  • 待聚合列的条件:
    columnHasType(
     type: Integer,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 聚合表达式: [
    dynamicAlias(
     expression:
    rowCount(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _non_null],
    ),
    ),
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • 分组列: null

输入:

id value distance
1 100 2000
2 null 100
3 500 300

输出:

id_non_null id_mean value_non_null value_mean
3 2.0 2 300.0

示例 3:边界情况 (Edge case)

参数值:

  • 待聚合列的条件:
    columnHasType(
     type: Integer,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 聚合表达式: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • 分组列: [id]

输入:

id value distance airline
1 100 2000 new air
1 200 3000 new air
2 500 3000 foundry air
2 400 1000 foundry air

输出:

id id_mean value_mean distance_mean
1 1.0 150.0 2500.0
2 2.0 450.0 2000.0

示例 4:边界情况 (Edge case)

描述: 所有整数列的平均值。

参数值:

  • 待聚合列的条件:
    columnHasType(
     type: Integer,
    )
  • 数据集: ri.foundry.main.dataset.a
  • 聚合表达式: [
    dynamicAlias(
     expression:
    mean(
     expression: column,
    ),
     transformer:
    columnNameConcat(
     inputs: [column, _mean],
    ),
    )]
  • 分组列: null

输入:

id value distance
1 100 2000
3 500 300

输出:

id_mean value_mean
2.0 300.0