跳转至

Frequent pattern growth(频繁模式增长(Frequent pattern growth))

Supported in: Batch

Frequent pattern (fp) growth finds frequent patterns in your dataset.

Transform categories: Aggregate, Other

Declared arguments

  • Input dataset: Source dataset containing items columns and transaction column.
    Table
  • Items column: Array column containing the items for the patterns.
    Column\>
  • Minimum support: Minimum fraction of how often a pattern needs to be present.
    Literal\

Examples

Example 1: Base case

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.6

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

Output:

pattern pattern_occurrence total_count
[ country: Germany, age_group: 20-30 ] 2 2
[ age_group: 20-30 ] 2 2
[ country: Germany ] 2 2

Example 2: Null case

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.0

Input:

customer_attributes
null

Output:

pattern pattern_occurrence total_count

Example 3: Null case

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.0

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ null ]

Output:

pattern pattern_occurrence total_count
[ country: Germany ] 1 2
[ country: Germany, age_group: 20-30 ] 1 2
[ null ] 1 2
[ age_group: 20-30 ] 1 2
[ gender: Female ] 1 2
[ gender: Female, country: Germany ] 1 2
[ gender: Female, country: Germany, age_group: 20-30 ] 1 2
[ gender: Female, age_group: 20-30 ] 1 2

Example 4: Edge case

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.0

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

Output:

pattern pattern_occurrence total_count
[ gender: Male ] 1 2
[ gender: Male, country: Germany ] 1 2
[ gender: Male, country: Germany, age_group: 20-30 ] 1 2
[ gender: Male, age_group: 20-30 ] 1 2
[ age_group: 20-30 ] 2 2
[ country: Germany ] 2 2
[ country: Germany, age_group: 20-30 ] 2 2
[ gender: Female ] 1 2
[ gender: Female, country: Germany ] 1 2
[ gender: Female, country: Germany, age_group: 20-30 ] 1 2
[ gender: Female, age_group: 20-30 ] 1 2


中文翻译

频繁模式增长(Frequent pattern growth)

支持:批处理(Batch)

频繁模式增长(Frequent pattern growth,FP-growth)用于发现数据集中的频繁模式。

转换类别:聚合(Aggregate)、其他(Other)

声明参数(Declared arguments)

  • 输入数据集(Input dataset): 包含项目列和事务列的源数据集。
    表(Table)
  • 项目列(Items column): 包含模式中项目的数组列。
    列\<数组\<字符串>>(Column\>)
  • 最小支持度(Minimum support): 模式需要出现的最小比例。
    字面量\<双精度浮点数>(Literal\

示例

示例 1:基本情况

参数值:

  • 输入数据集: ri.foundry.main.dataset.a
  • 项目列: customer_attributes
  • 最小支持度: 0.6

输入:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

输出:

pattern pattern_occurrence total_count
[ country: Germany, age_group: 20-30 ] 2 2
[ age_group: 20-30 ] 2 2
[ country: Germany ] 2 2

示例 2:空值情况

参数值:

  • 输入数据集: ri.foundry.main.dataset.a
  • 项目列: customer_attributes
  • 最小支持度: 0.0

输入:

customer_attributes
null

输出:

pattern pattern_occurrence total_count

示例 3:空值情况

参数值:

  • 输入数据集: ri.foundry.main.dataset.a
  • 项目列: customer_attributes
  • 最小支持度: 0.0

输入:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ null ]

输出:

pattern pattern_occurrence total_count
[ country: Germany ] 1 2
[ country: Germany, age_group: 20-30 ] 1 2
[ null ] 1 2
[ age_group: 20-30 ] 1 2
[ gender: Female ] 1 2
[ gender: Female, country: Germany ] 1 2
[ gender: Female, country: Germany, age_group: 20-30 ] 1 2
[ gender: Female, age_group: 20-30 ] 1 2

示例 4:边界情况

参数值:

  • 输入数据集: ri.foundry.main.dataset.a
  • 项目列: customer_attributes
  • 最小支持度: 0.0

输入:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

输出:

pattern pattern_occurrence total_count
[ gender: Male ] 1 2
[ gender: Male, country: Germany ] 1 2
[ gender: Male, country: Germany, age_group: 20-30 ] 1 2
[ gender: Male, age_group: 20-30 ] 1 2
[ age_group: 20-30 ] 2 2
[ country: Germany ] 2 2
[ country: Germany, age_group: 20-30 ] 2 2
[ gender: Female ] 1 2
[ gender: Female, country: Germany ] 1 2
[ gender: Female, country: Germany, age_group: 20-30 ] 1 2
[ gender: Female, age_group: 20-30 ] 1 2