跳转至

Left lookup join(左查找连接(Left lookup join))

Supported in: Streaming

Joins two datasets together, keeping all rows from the left table and only matching rows from the right dataset.

Transform categories: Join

Declared arguments

  • Condition for columns to select on the left: All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Condition for columns to select on the right: All columns in the right input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Join condition: A list of columns from left and right input to join on.
    List\, Column\\>>
  • Left dataset: Left dataset to use in join.
    Table
  • Max rows to join with a single row: Places a limit on the maximum number of rows from the right input that a single row from the left can be joined with. The highest this limit can go is ten. Setting too high of a limit can have negative performance implications.
    Literal\
  • Right dataset: Right dataset to use in join.
    Table
  • optional Prefix for columns from right: Prefix to add to all columns on the right hand side.
    Literal\

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number)]
  • Left dataset: ri.foundry.main.dataset.left
  • Max rows to join with a single row: 10
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR

Example 2: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number), (factor, factor)]
  • Left dataset: ri.foundry.main.dataset.left
  • Max rows to join with a single row: 10
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

Output:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
MT-222 new airline 5 null
XB-123 foundry airline 5 null
MT-222 new air 4 null
KK-452 new air 1 null
PA-452 new air 2 null
XB-123 foundry airline 2 LHR

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number)]
  • Left dataset: ri.foundry.main.dataset.left
  • Max rows to join with a single row: 10
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
XB-123 LGW
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
XB-123 foundry air 124 2 LGW
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
XB-123 foundry airline 335 5 LGW
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
PA-452 new air 212 2 null
XB-123 foundry airline 1134 2 LHR
XB-123 foundry airline 1134 2 LGW


中文翻译

左查找连接(Left lookup join)

支持:流式处理(Streaming)

将两个数据集连接在一起,保留左表中的所有行以及右数据集中匹配的行。

转换类别:连接(Join)

声明参数(Declared arguments)

  • 左侧列选择条件(Condition for columns to select on the left): 左侧输入模式中的所有列都将被测试是否匹配此条件。如果匹配,该列将被选中并输出。
    列谓词(ColumnPredicate)
  • 右侧列选择条件(Condition for columns to select on the right): 右侧输入模式中的所有列都将被测试是否匹配此条件。如果匹配,该列将被选中并输出。
    列谓词(ColumnPredicate)
  • 连接条件(Join condition): 用于连接的左右输入列列表。
    列表\<元组\<列\<布尔型(Boolean) | 字节型(Byte) | 日期型(Date) | 双精度浮点型(Double) | 单精度浮点型(Float) | 整型(Integer) | 长整型(Long) | 短整型(Short) | 字符串型(String) | 时间戳型(Timestamp)>, 列\<布尔型(Boolean) | 字节型(Byte) | 日期型(Date) | 双精度浮点型(Double) | 单精度浮点型(Float) | 整型(Integer) | 长整型(Long) | 短整型(Short) | 字符串型(String) | 时间戳型(Timestamp)>>>
  • 左侧数据集(Left dataset): 用于连接的左侧数据集。
    表(Table)
  • 单行最大连接行数(Max rows to join with a single row): 限制左侧单行最多可与右侧输入中的多少行进行连接。此限制最高可设为十。设置过高的限制可能会对性能产生负面影响。
    字面量\<整型(Literal\)>
  • 右侧数据集(Right dataset): 用于连接的右侧数据集。
    表(Table)
  • 可选 右侧列前缀(Prefix for columns from right): 为右侧所有列添加的前缀。
    字面量\<字符串(Literal\)>

示例(Examples)

示例 1:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件: [(tail_number, tail_number)]
  • 左侧数据集: ri.foundry.main.dataset.left
  • 单行最大连接行数: 10
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR

示例 2:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [tail_number, airline, factor],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件: [(tail_number, tail_number), (factor, factor)]
  • 左侧数据集: ri.foundry.main.dataset.left
  • 单行最大连接行数: 10
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport factor
XB-123 LHR 2
MT-222 CPH 1
KK-452 JFK 10
JR-201 IAD 4

输出:

tail_number airline factor home_airport
XB-123 foundry air 2 LHR
MT-222 new airline 5 null
XB-123 foundry airline 5 null
MT-222 new air 4 null
KK-452 new air 1 null
PA-452 new air 2 null
XB-123 foundry airline 2 LHR

示例 3:基本情况

参数值:

  • 左侧列选择条件:
    allColumns(

    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • 连接条件: [(tail_number, tail_number)]
  • 左侧数据集: ri.foundry.main.dataset.left
  • 单行最大连接行数: 10
  • 右侧数据集: ri.foundry.main.dataset.right
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
XB-123 LGW
MT-222 CPH
KK-452 JFK
JR-201 IAD

输出:

tail_number airline miles factor home_airport
XB-123 foundry air 124 2 LHR
XB-123 foundry air 124 2 LGW
MT-222 new airline 1123 5 CPH
XB-123 foundry airline 335 5 LHR
XB-123 foundry airline 335 5 LGW
MT-222 new air 565 4 CPH
KK-452 new air 222 1 JFK
PA-452 new air 212 2 null
XB-123 foundry airline 1134 2 LHR
XB-123 foundry airline 1134 2 LGW