跳转至

Inner join(内连接(Inner join))

Supported in: Batch, Faster

Joins two datasets together, keeping only rows that satisfy the provided condition from each table.

Transform categories: Join

Declared arguments

  • Condition for columns to select on the left: All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Condition for columns to select on the right: All columns in the right input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Join condition: Condition on which to join.
    Expression\
  • Left dataset: Left dataset to use in join.
    Table
  • Right dataset: Right dataset to use in join.
    Table
  • optional Prefix for columns from right: Prefix to add to all columns on the right hand side.
    Literal\

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
XB-123 foundry airline LHR

Example 2: Base case

Description: Simple complex join condition.

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [tail_number, home_airport, factor],
    )
  • Join condition:
    lessThan(
     left: factor,
     right: factor,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: right_

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

Output:

tail_number airline factor right_tail_number right_home_airport right_factor
XB-123 foundry air 2 KK-452 JFK 10
XB-123 foundry air 2 JR-201 IAD 4
MT-222 new airline 5 KK-452 JFK 10
XB-123 foundry airline 5 KK-452 JFK 10
MT-222 new air 4 KK-452 JFK 10
KK-452 new air 1 XB-123 LHR 2
KK-452 new air 1 KK-452 JFK 10
KK-452 new air 1 JR-201 IAD 4
PA-452 new air 2 KK-452 JFK 10
PA-452 new air 2 JR-201 IAD 4
XB-123 foundry airline 2 KK-452 JFK 10
XB-123 foundry airline 2 JR-201 IAD 4

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

Output:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
XB-123 foundry airline 2 LHR

Example 4: Base case

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
XB-123 foundry airline 1134 2 LHR


中文翻译


内连接(Inner join)

支持:批处理(Batch)、快速处理(Faster)

将两个数据集连接在一起,仅保留每个表中满足指定条件的行。

转换类别:连接(Join)

声明的参数

  • 左侧列选择条件(Condition for columns to select on the left): 将测试左侧输入模式中的所有列是否匹配此条件。若匹配,则该列将被选中并输出。
    ColumnPredicate
  • 右侧列选择条件(Condition for columns to select on the right): 将测试右侧输入模式中的所有列是否匹配此条件。若匹配,则该列将被选中并输出。
    ColumnPredicate
  • 连接条件(Join condition): 用于连接的条件。
    Expression\
  • 左侧数据集(Left dataset): 用于连接的左侧数据集。
    Table
  • 右侧数据集(Right dataset): 用于连接的右侧数据集。
    Table
  • 可选 右侧列前缀(Prefix for columns from right): 为右侧所有列添加的前缀。
    Literal\

示例

示例 1:基础情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
XB-123 foundry airline LHR

示例 2:基础情况

描述: 简单的复杂连接条件。

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, home_airport, factor],
    )
  • 连接条件:
    lessThan(
     left: factor,
     right: factor,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: right_

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

输出:

tail_number airline factor right_tail_number right_home_airport right_factor
XB-123 foundry air 2 KK-452 JFK 10
XB-123 foundry air 2 JR-201 IAD 4
MT-222 new airline 5 KK-452 JFK 10
XB-123 foundry airline 5 KK-452 JFK 10
MT-222 new air 4 KK-452 JFK 10
KK-452 new air 1 XB-123 LHR 2
KK-452 new air 1 KK-452 JFK 10
KK-452 new air 1 JR-201 IAD 4
PA-452 new air 2 KK-452 JFK 10
PA-452 new air 2 JR-201 IAD 4
XB-123 foundry airline 2 KK-452 JFK 10
XB-123 foundry airline 2 JR-201 IAD 4

示例 3:基础情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

输出:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
XB-123 foundry airline 2 LHR

示例 4:基础情况

参数值:

  • 左侧列选择条件:
    allColumns(

    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
XB-123 foundry airline 1134 2 LHR