跳转至

GeoPoint-to-GeoPoint 3d distance inner join(GeoPoint 到 GeoPoint 三维距离内连接(GeoPoint-to-GeoPoint 3d distance inner join))

Supported in: Batch

Inner joins left and right datasets together based on the distance between point geometries. The geometries must represent points, and may optionally include a z-coordinate. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84. Non-point geometries are ignored, and the entire right dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 4 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Declared arguments

  • Condition for columns to select on the left: All columns in the left input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Condition for columns to select on the right: All columns in the right input schema will be tested to see if they match this condition. If they match, the column will be selected in the output.
    ColumnPredicate
  • Distance: The distance within which to join geometries, in the same units as the coordinate reference system.
    Literal\
  • Join key: The geojson columns from the left and right inputs on which to join.
    Tuple\, Column\\>
  • Left dataset: Left dataset to use in join.
    Table
  • Projected coordinate system: Input geometries will be converted to this coordinate system prior to the join, and distance will be measured in the units of the given coordinate system. Formatted as "authority:id", so for example UTM zone 18N could be identified by EPSG:32618.
    Literal\
  • Right dataset: Right dataset to use in join.
    Table
  • Use z-coordinate: Whether to include z-coordinates and calculate the 3 dimensional distance. If false, z-coordinates are ignored and 2 dimensional distances are calculated.
    Literal\
  • optional Prefix for columns from right: Prefix to add to all columns on the right hand side.
    Literal\

Examples

Example 1: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 2.5
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [0.0, 1.0], "type":"Point"} rhsVal2 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]

Example 2: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 2.5
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: true
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0
{"coordinates": [0.0, 5.0], "type":"Point"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [1.0, 1.0, 6.0], "type":"Point"} rhsVal2 [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 3.0], "type":"Point"} rhsVal3 [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal4 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 0.0, 3.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [1.0, 1.0, 6.0], "type":"Point"} [ 0.0, 1.0 ]

Example 3: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryColRhs, rhs-1],
    )
  • Distance: 1641
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: epsg:2868
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0
null 43.0

ri.foundry.main.dataset.right

geometryColRhs rhs-1
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"} rhsVal2

Output:

geometryColLhs lhs-1 geometryColRhs rhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1

Example 4: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0

Example 5: Base case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [55.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]

Output:

rhs_geometryCol rhs_arrayCol
{"coordinates": [55.0, 5.0], "type":"Point"} [ 0.0, 1.0 ]

Example 6: Null case

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col1, arrayCol],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} 44.0
null 45.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"} rhsVal2 [ 0.0, 1.0 ]
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} rhsVal3 [ 0.0, 1.0 ]
null rhsVal4 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1 rhs_geometryCol rhs_col1 rhs_arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0 {"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]


中文翻译


GeoPoint 到 GeoPoint 三维距离内连接(GeoPoint-to-GeoPoint 3d distance inner join)

支持:批处理(Batch)

根据点几何图形(point geometries)之间的距离,对左侧和右侧数据集进行内连接(inner join)。几何图形必须表示点,并且可以选择包含 z 坐标。在连接之前,内部会将几何图形转换为指定的投影坐标系(projected coordinate reference system),然后再转换回 WGS84。非点几何图形将被忽略,并且整个右侧数据集必须能够放入驱动器和执行器内存中。一个 3 GB 的执行器应能够处理邻居数据集中多达 400 万个点。

转换类别:地理空间(Geospatial),连接(Join)

声明的参数(Declared arguments)

  • 左侧列选择条件(Condition for columns to select on the left): 左侧输入模式中的所有列都将被测试以查看它们是否匹配此条件。如果匹配,该列将被选中并输出。
    ColumnPredicate
  • 右侧列选择条件(Condition for columns to select on the right): 右侧输入模式中的所有列都将被测试以查看它们是否匹配此条件。如果匹配,该列将被选中并输出。
    ColumnPredicate
  • 距离(Distance): 连接几何图形的距离范围,单位与坐标系(coordinate reference system)相同。
    Literal\
  • 连接键(Join key): 来自左侧和右侧输入的用于连接的 geojson 列。
    Tuple\, Column\\>
  • 左侧数据集(Left dataset): 用于连接的左侧数据集。
    Table
  • 投影坐标系(Projected coordinate system): 输入几何图形将在连接之前转换为此坐标系,距离将使用给定坐标系的单位进行测量。格式为 "authority:id",例如,UTM 18N 区可以通过 EPSG:32618 标识。
    Literal\
  • 右侧数据集(Right dataset): 用于连接的右侧数据集。
    Table
  • 使用 z 坐标(Use z-coordinate): 是否包含 z 坐标并计算三维距离。如果为 false,则忽略 z 坐标并计算二维距离。
    Literal\
  • 可选 右侧列前缀(Prefix for columns from right): 添加到右侧所有列的前缀。
    Literal\

示例

示例 1:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • 距离: 2.5
  • 连接键: (geometryColLhs, geometryCol)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: EPSG:4326
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: false
  • 右侧列前缀: rhs_

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [0.0, 1.0], "type":"Point"} rhsVal2 [ 0.0, 1.0 ]

输出:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]

示例 2:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • 距离: 2.5
  • 连接键: (geometryColLhs, geometryCol)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: EPSG:4326
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: true
  • 右侧列前缀: rhs_

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0
{"coordinates": [0.0, 5.0], "type":"Point"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [1.0, 1.0, 6.0], "type":"Point"} rhsVal2 [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 3.0], "type":"Point"} rhsVal3 [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal4 [ 0.0, 1.0 ]

输出:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 0.0, 3.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [1.0, 1.0, 6.0], "type":"Point"} [ 0.0, 1.0 ]

示例 3:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColRhs, rhs-1],
    )
  • 距离: 1641
  • 连接键: (geometryColLhs, geometryColRhs)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: epsg:2868
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: false
  • 右侧列前缀: null

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0
null 43.0

ri.foundry.main.dataset.right

geometryColRhs rhs-1
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"} rhsVal2

输出:

geometryColLhs lhs-1 geometryColRhs rhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1

示例 4:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [],
    )
  • 距离: 10.0
  • 连接键: (geometryColLhs, geometryCol)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: EPSG:4326
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: false
  • 右侧列前缀: rhs_

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]

输出:

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0

示例 5:基本情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • 距离: 10.0
  • 连接键: (geometryColLhs, geometryCol)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: EPSG:4326
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: false
  • 右侧列前缀: rhs_

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [55.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]

输出:

rhs_geometryCol rhs_arrayCol
{"coordinates": [55.0, 5.0], "type":"Point"} [ 0.0, 1.0 ]

示例 6:空值情况

参数值:

  • 左侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • 右侧列选择条件:
    columnNameIsIn(
     columnNames: [geometryCol, col1, arrayCol],
    )
  • 距离: 10.0
  • 连接键: (geometryColLhs, geometryCol)
  • 左侧数据集: ri.foundry.main.dataset.left
  • 投影坐标系: EPSG:4326
  • 右侧数据集: ri.foundry.main.dataset.right
  • 使用 z 坐标: false
  • 右侧列前缀: rhs_

输入:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} 44.0
null 45.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"} rhsVal2 [ 0.0, 1.0 ]
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} rhsVal3 [ 0.0, 1.0 ]
null rhsVal4 [ 0.0, 1.0 ]

输出:

geometryColLhs lhs-1 rhs_geometryCol rhs_col1 rhs_arrayCol
{"coordinates": [15.0, 5.0], "type":"Point"} 42.0 {"coordinates": [15.0, 5.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]