跳转至

Outer join(外连接(Outer join))

Supported in: Batch, Faster

Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.

Transform categories: Join

Declared arguments

  • Condition for columns to select on the left: All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Condition for columns to select on the right: All columns in the right input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Join condition: Condition on which to join.
    Expression\
  • Left dataset: Left dataset to use in join.
    Table
  • Right dataset: Right dataset to use in join.
    Table
  • optional Prefix for columns from right: Prefix to add to all columns on the right hand side.
    Literal\

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR
JR-201 null IAD

Example 2: Base case

Description: Simple complex join condition.

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [tail_number, home_airport, factor],
    )
  • Join condition:
    lessThan(
     left: factor,
     right: factor,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: right_

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

Output:

tail_number airline factor right_tail_number right_home_airport right_factor
XB-123 foundry air 2 KK-452 JFK 10
XB-123 foundry air 2 JR-201 IAD 4
MT-222 new airline 5 KK-452 JFK 10
XB-123 foundry airline 5 KK-452 JFK 10
MT-222 new air 4 KK-452 JFK 10
KK-452 new air 1 XB-123 LHR 2
KK-452 new air 1 KK-452 JFK 10
KK-452 new air 1 JR-201 IAD 4
PA-452 new air 2 KK-452 JFK 10
PA-452 new air 2 JR-201 IAD 4
XB-123 foundry airline 2 KK-452 JFK 10
XB-123 foundry airline 2 JR-201 IAD 4
null null null MT-222 CPH 1

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

Output:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
MT-222 new airline 5 null
XB-123 foundry airline 5 null
MT-222 new air 4 null
KK-452 new air 1 null
PA-452 new air 2 null
XB-123 foundry airline 2 LHR
JR-201 null 4 IAD
KK-452 null 10 JFK
MT-222 null 1 CPH

Example 4: Base case

Description: Note, values from the left and right of tail_number are coalesced.

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
PA-452 new air 212 2 null
XB-123 foundry airline 1134 2 LHR
JR-201 null null null IAD


中文翻译

外连接(Outer join)

支持:批处理(Batch)、快速处理(Faster)

将提供的多个数据集输入进行外连接,保留两个数据集中的所有行。当没有满足给定条件的行时,列将包含空值。

转换类别:连接(Join)

声明参数(Declared arguments)

  • 左侧列选择条件(Condition for columns to select on the left): 左侧输入模式中的所有列都将被测试是否匹配此条件。如果匹配,该列将被选中输出。
    ColumnPredicate
  • 右侧列选择条件(Condition for columns to select on the right): 右侧输入模式中的所有列都将被测试是否匹配此条件。如果匹配,该列将被选中输出。
    ColumnPredicate
  • 连接条件(Join condition): 用于连接的条件。
    Expression\
  • 左侧数据集(Left dataset): 用于连接的左侧数据集。
    Table
  • 右侧数据集(Right dataset): 用于连接的右侧数据集。
    Table
  • 可选 右侧列前缀(Prefix for columns from right): 添加到右侧所有列的前缀。
    Literal\

示例(Examples)

示例 1:基础案例

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR
JR-201 null IAD

示例 2:基础案例

描述: 简单的复杂连接条件。

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, home_airport, factor],
    )
  • 连接条件:
    lessThan(
     left: factor,
     right: factor,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: right_

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

输出:

tail_number airline factor right_tail_number right_home_airport right_factor
XB-123 foundry air 2 KK-452 JFK 10
XB-123 foundry air 2 JR-201 IAD 4
MT-222 new airline 5 KK-452 JFK 10
XB-123 foundry airline 5 KK-452 JFK 10
MT-222 new air 4 KK-452 JFK 10
KK-452 new air 1 XB-123 LHR 2
KK-452 new air 1 KK-452 JFK 10
KK-452 new air 1 JR-201 IAD 4
PA-452 new air 2 KK-452 JFK 10
PA-452 new air 2 JR-201 IAD 4
XB-123 foundry airline 2 KK-452 JFK 10
XB-123 foundry airline 2 JR-201 IAD 4
null null null MT-222 CPH 1

示例 3:基础案例

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    and(
     conditions: [
    equals(
     left: tail_number,
     right: tail_number,
    ),
    equals(
     left: factor,
     right: factor,
    )],
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

输出:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
MT-222 new airline 5 null
XB-123 foundry airline 5 null
MT-222 new air 4 null
KK-452 new air 1 null
PA-452 new air 2 null
XB-123 foundry airline 2 LHR
JR-201 null 4 IAD
KK-452 null 10 JFK
MT-222 null 1 CPH

示例 4:基础案例

描述: 注意,左侧和右侧的 tail_number 值已合并。

参数值:

  • 左侧列选择条件:
    allColumns(

    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • 左侧数据集: ri.foundry.main.dataset.left
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
PA-452 new air 212 2 null
XB-123 foundry airline 1134 2 LHR
JR-201 null null null IAD