跳转至

Functions index(函数索引)

Pipeline Builder provides expressions that operate at different levels. They can generally be categorized as row level, aggregations or generators.

Row level functions operate on values from a single row. Most expressions fall in this category, for example add.

Aggregations aggregate multiple row values into one. For example the 'sum' expression.

Generators produce multiple values from a single row. For example the 'explode_array' expression

Transforms are functions that operate on a whole table or multiple tables. For example the 'drop' transform.The following document will outline the available expressions and transforms.

Row level expressions


Absolute value

Supported in: Batch, Faster, Streaming

Returns the absolute value.

Expression categories: Numeric

Type variable bounds: T accepts Numeric

Output type: T

Example

Argument values:

  • Expression: numeric_column
numeric_column Output
0.0 0.0
1.1 1.1
-1.1 1.1

See details.


Add numbers

Supported in: Batch, Faster, Streaming

Calculates the sum of all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b]
col_a col_b Output
0 1 1
3 -2 1

See details.


Add or update map

Supported in: Batch, Streaming

Updates a value by key in a map or adds new key value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Expression: 4
  • Key: k
  • Map: map_col
map_col Output
{
 a -> 1,
 b -> 2,
 k -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}
{
 a -> 1,
 b -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}

See details.


Add or update struct field

Supported in: Batch, Faster, Streaming

Updates a field of a struct or adds a new field.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Expression: value
  • Locator: airline.id
  • Struct: struct
struct value Output
{
airline: {
id: NA,
},
}
1 {
airline: {
id: 1,
},
}
{
airline: {
id: FE,
},
}
2 {
airline: {
id: 2,
},
}

See details.


Add value to date

Supported in: Batch, Faster, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years after 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-02-01
  • Unit: DAYS
  • Value: 2

Output: 2022-02-03

See details.


All array elements satisfy

Supported in: Batch, Streaming

Return true if the expression is true for all elements in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
miles Output
[ 12300, null ] false
[ null, null ] true

See details.


And

Supported in: Batch, Faster, Streaming

Returns true if all of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_boolean right_boolean Output
true true true
true false false
false true false
false false false

See details.


Any array element satisfy

Supported in: Batch, Streaming

Return true if the expression is true for any element in the array.

Expression categories: Array

Output type: Boolean

Example

Argument values:

  • Array: miles
  • Boolean condition:
    isNull(
     expression: element,
    )
miles Output
[ 12300, null ] true
[ 12300, 12000 ] false

See details.


Arccos

Supported in: Batch, Faster, Streaming

Inverse cosine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 1.0

Output: 0.0

See details.


Arcsin

Supported in: Batch, Faster, Streaming

Inverse sine function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: radians
  • Value: 0.0

Output: 0.0

See details.


Arctan

Supported in: Batch, Faster, Streaming

Inverse tangent function.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Value: angle
angle Output
-1.0 -45.0
0.0 0.0
1.0 45.0

See details.


Arctan2

Supported in: Batch, Faster, Streaming

Returns the angle θ between the ray from the origin to the point (x, y) and the positive x-axis, confined to −π<θ<=π.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • X: x
  • Y: y
y x Output
0.0 0.0 0.0
1.0 0.0 90.0
0.0 -1.0 180.0
-1.0 0.0 -90.0

See details.


Area

Supported in: Batch, Streaming

Calculates area of a geometry in meters squared using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See details.


Array add

Supported in: Batch, Faster, Streaming

Adds a value to the array at a specified index.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Array: numbers
  • Index: 1
  • Value: 1
numbers Output
[ 3, 5 ] [ 1, 3, 5 ]
[ 2 ] [ 1, 2 ]
[ ] [ 1 ]

See details.


Array cartesian product

Supported in: Batch, Streaming

Compute the cartesian product of arrays.

Expression categories: Array

Output type: Array\

Example

Argument values:

  • Expression: [first, second]
first second Output
[ 1, 2 ] [ 3, 4 ] [ {
first: 1,
second: 3,
}, {
first: 1,
 *second...

See details.


Array concat

Supported in: Batch, Faster, Streaming

Concatenates the provided arrays into a single array, without de-duplication.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 4, 5 ]]

Output: [ 1, 2, 3, 4, 5 ]

See details.


Array contains

Supported in: Batch, Faster, Streaming

Returns true if the array contains the value.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Array: part_ids
  • Value: BRR-123
part_ids Output
[ AWE-112, BRR-123 ] true
[ AWE-222, ABC-543 ] false

See details.


Array contains null

Supported in: Batch, Faster, Streaming

Returns true if the array contains null.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_ids Output
[ AWE-112, BRR-123, null ] true
[ AWE-222, ABC-543 ] false

See details.


Array difference

Supported in: Batch, Faster, Streaming

Returns all unique elements in the left array that are not in the right array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Left array: [ 1, 2, 3 ]
  • Right array: [ 2, 3, 4 ]

Output: [ 1 ]

See details.


Array distinct

Supported in: Batch, Faster, Streaming

Removes duplicates and returns distinct values from the array.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: Array\

Example

Argument values:

  • Expression: [ 1, 1, 2, 3 ]

Output: [ 1, 2, 3 ]

See details.


Array element

Supported in: Batch, Faster, Streaming

Returns the element at a given position from the input array. Positions outside of the array will return null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Position: 1

Output: 10

See details.


Array elements are distinct

Supported in: Batch, Faster, Streaming

Returns true if the array's elements are distinct, false otherwise. If the array is null, the returned value is false.

Expression categories: Array, Boolean

Output type: Boolean

Example

Argument values:

  • Expression: part_ids
part_ids Output
[ ABC-123, DCE-123, EFG-123 ] true
[ ABC-123, ABC-123, EFG-123 ] false

See details.


Array flatten

Supported in: Batch, Faster, Streaming

Creates a single array from an input nested array by unioning the elements within the first level of nesting.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expression: array
array Output
[ [ 1, 2, 3 ], [ 4, 5, 6 ] ] [ 1, 2, 3, 4, 5, 6 ]

See details.


Array intersect

Supported in: Batch, Faster, Streaming

Removes duplicates and intersects a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 3 ]

See details.


Array maximum

Supported in: Batch, Faster, Streaming

Returns the maximum value of an array column.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 3

See details.


Array minimum

Supported in: Batch, Faster, Streaming

Returns the minimum value of an array column.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: 1

See details.


Array position

Supported in: Batch, Faster, Streaming

Returns a position/index of the first occurrence of the 'value' in a given array. Returns null when value is not found or when any of the arguments are null.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Long

Example

Argument values:

  • Array: [ 10, 11, 12 ]
  • Value: 10

Output: 1

See details.


Array remove

Supported in: Batch, Faster, Streaming

Returns an array after removing all provided 'value' from the given array.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Array: [ 1, 2, 3 ]
  • Value: 1

Output: [ 2, 3 ]

See details.


Array repeat

Supported in: Batch, Faster, Streaming

Returns an array with the contents of array concatenated value times.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Array: [ 1, 2 ]
  • Value: 2

Output: [ 1, 2, 1, 2 ]

See details.


Array reverse

Supported in: Batch, Faster, Streaming

Reverse the order of elements in 'array'.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expression: [ 1, 2, 3 ]

Output: [ 3, 2, 1 ]

See details.


Array sort

Supported in: Batch, Faster, Streaming

Returns a sorted array of the given input array. All null values are placed at the end of a descending array and at the front of an ascending array.

Expression categories: Array

Type variable bounds: T accepts ComparableType

Output type: Array\

Example

Argument values:

  • Direction: ASCENDING
  • Expression: [ 5, 3, 6 ]

Output: [ 3, 5, 6 ]

See details.


Array sort by struct key

Supported in: Batch, Streaming

Returns a sorted array of the given input array of structs sorted by the values of the given struct keys.

Expression categories: Array

Output type: Array\

Example

Argument values:

  • Input array: [ {
    age: 20,
    }, {
    age: 10,
    }, {
    age: 30,
    } ]
  • Sort keys: [(age, ASCENDING)]

Output: [ {
age: 10,
}, {
age: 20,
}, {
age: 30,
} ]

See details.


Array union

Supported in: Batch, Faster, Streaming

Removes duplicates and unions a list of arrays.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: [ 1, 2, 3, 4 ]

See details.


Arrays have intersection

Supported in: Batch, Faster, Streaming

Checks if given arrays have at least one shared element.

Expression categories: Array, Boolean

Type variable bounds: T accepts AnyType

Output type: Boolean

Example

Argument values:

  • Expressions: [[ 1, 2, 3 ], [ 3, 4 ]]

Output: true

See details.


Arrays zip

Supported in: Batch, Faster, Streaming

Zips a list of given arrays into a merged array of structs in which the n-th struct contains all n-th values of input arrays.

Expression categories: Array

Output type: Array\

Example

Argument values:

  • Expressions: [first_array, second_array]
first_array second_array Output
[ 1, 2, 3 ] [ 4, 5, 6 ] [ {
first_array: 1,
second_array: 4,
}, {
first_array: 2,<...

See details.


Base 64 decode to string

Supported in: Batch, Faster, Streaming

Base64 decode the given expression. Uses utf-8 encoding for binary.

Expression categories: Binary, Cast, String

Output type: String

Example

Argument values:

  • Expression: encoded
encoded Output
Zm9v foo
YmFy bar

See details.


Base64 decode

Supported in: Batch, Faster, Streaming

Base64 decode the given expression.

Expression categories: Binary, Cast

Output type: Binary

Example

Argument values:

  • Expression: city_base64
city_base64 Output
TG9uZG9u TG9uZG9u
Q29wZW5oYWdlbg== Q29wZW5oYWdlbg==
TmV3IFlvcms= TmV3IFlvcms=

See details.


Base64 encode

Supported in: Batch, Faster, Streaming

Base64 encode the given expression.

Expression categories: Binary, Cast

Output type: String

Example

Argument values:

  • Expression: city
city Output
London TG9uZG9u
Copenhagen Q29wZW5oYWdlbg==
New York TmV3IFlvcms=

See details.


Bit shift left

Supported in: Batch, Streaming

Shift the given value a number of bits left.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 2

See details.


Bit shift right

Supported in: Batch, Streaming

Shift the given value a number of bits right.

Expression categories: Binary

Type variable bounds: E accepts Byte | Integer | Long | Short

Output type: E

Example

Argument values:

  • Expression: 1
  • Number of bits: 1

Output: 0

See details.


Buffer H3 indices

Supported in: Batch, Faster, Streaming

Creates a buffer of distance k from an array of H3 indices.

Expression categories: Geospatial

Output type: Array\

See details.


Calculate destination point

Supported in: Batch, Faster, Streaming

Calculates the destination point along a specified path given a starting point, course, and distance.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Course: course
  • Distance: distance
  • Starting point: point_a
  • Calculation method: GREAT_CIRCLE
point_a course distance Output
{
latitude: 48.8567,
longitude: 2.3508,
}
225.0 32000.0 {
latitude: 48.65279552300661,
longitude: 2.0427666779658806,
}

See details.


Calculate haversine distance

Supported in: Batch, Faster, Streaming

Calculates the haversine distance between two latitude and longitude point pairs in meters.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Point a: point_a
  • Point b: point_b
point_a point_b Output
{
latitude: 41.507483,
longitude: -99.436554,
}
{
latitude: 38.504048,
longitude: -98.315949,
}
347328.82778977347
{
latitude: 22.308919,
longitude: 113.914603,
}
{
latitude: -33.946111,
longitude: 151.177222,
}
7393894.00134442

See details.


Case

Supported in: Batch, Faster, Streaming

Choose between different branches based on conditions.

Expression categories: Popular

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Default: Yes
  • Branches: [(
    lessThan(
     left: miles,
     right: 15000,
    ), No)]
miles Output
20053 Yes
10210 No
34120 Yes

See details.


Cast

Supported in: Batch, Faster, Streaming

Cast expression to given type.

Expression categories: Cast, Popular

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Casting long to string

Argument values:

  • Expression: 1234
  • Type: String

Output: 1234

See details.


Cast media schema

Supported in:

Casts a media reference to a specific media schema and format. This is useful when the input media has a generic schema (multimodal) but the actual content is known to be a specific type (such as a png image). The cast narrows the type metadata to allow downstream operations that require specific schema types.

Expression categories: Media

Output type: Media reference

See details.


Ceil

Supported in: Batch, Faster, Streaming

Returns ceil of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 11

See details.


Change timestamp time zone

Supported in: Batch, Faster

Changes the time zone of a timestamp.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Output time zone: America/Chicago
  • Timestamp: 2020-04-28T05:09:00Z
  • Input time zone: US/Eastern

Output: 2020-04-28T04:09:00Z

See details.


Character-wise translate string

Supported in: Batch, Faster, Streaming

Replaces individual characters from the input column that are found in the matching with the corresponding character in the replacement string. If the matching string is longer than the replacement string, characters at the end of the matching string will be dropped.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: translate
  • Matching string: rnlt
  • Replacement string: 123

Output: 1a2s3ae

See details.


Chunk string

Supported in: Batch, Streaming

Chunk string into chunks of a specified size and on specified separators.

Expression categories: String

Output type: Array\

Example

Argument values:

  • Expression: string
  • Chunk overlap: null
  • Chunk size: 10
  • Keep separator: null
  • Separators: null
string Output
hello [ hello ]
hello world. the quick brown fox jumps over the fence. [ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]

See details.


Cipher decrypt

Supported in: Batch, Faster, Streaming

Decrypts expression with cipher.

Expression categories: Other

Output type: String

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-decrypt
  • Expression: string
string Output
CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER bar

See details.


Cipher encrypt

Supported in: Batch, Faster, Streaming

Encrypts expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-encrypt
  • Expression: string
string Output
bar CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER

See details.


Cipher hash

Supported in: Batch, Faster, Streaming

Hashes expression with cipher.

Expression categories: Other

Output type: Cipher Text

Example

Argument values:

  • Cipher license rid: ri.bellaso.main.cipher-license.1-hash
  • Expression: string
string Output
bar CIPHER::ri.bellaso.main.cipher-channel.1::c70a14f5cc57c940e3265045a5554d641bd549ee27a571a05cdbc75c77762eb86b1144c12f1bb7811a0bcec08b2f143989c44022e4664f615d6885ad640332cb::CIPHER

See details.


Clean string

Supported in: Batch, Faster, Streaming

Applies the set of clean actions on the expression.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Clean actions: {trim}
  • Expression: hello world

Output: hello world

See details.


Compact a set of H3 indices

Supported in: Batch, Faster, Streaming

Compact H3 indices into a subset of mixed resolutions if possible. Running the inverse operation uncompact is guaranteed to yield the same set of indices that were compacted if the input indices were all the same resolution. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array\

Example

Argument values:

  • H3 indices: h3_set
h3_set Output
[ 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffffff, 87754a934ffff... [ 86754e64fffffff, 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffff...

See details.


Concatenate strings

Supported in: Batch, Faster, Streaming

Concatenates a list of strings with the specified separator.

Expression categories: String

Output type: String

Example

Argument values:

  • Expressions: [hello, world]
  • Null output if any input is null: null
  • Separator: _

Output: hello_world

See details.


Construct delegated media Gotham identifier (GID)

Supported in: Batch, Streaming

Expression to construct a valid delegated media Gotham identifier (GID) from components. If result is more than 1024 characters, produces a null row.

Expression categories: Other

Output type: Delegated media Gotham identifier (GID)

Example

Argument values:

  • Media locator: locator
  • Media type: mediaType
  • Producer instance: invalidUuid
mediaType locator Output
testaudiotype empty string null

See details.


Convert DMS to GeoPoint

Supported in: Batch, Streaming

Converts a geospatial coordinate string in degrees, minutes, seconds (DMS) format to a GeoPoint in accordance to user-provided formats. The default formats are DDD*°MM*'SS*"H and DDD*MMSSssH. The formats are run in order, and the first matching format will be returned. See formatting guide on how to write user-generated formats.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Coordinates: coordinates
  • Formats: null
coordinates Output
078261594N075220923E {
latitude: 78.43776111111112,
longitude: 75.36923055555555,
}
046115095S069524119W {
latitude: -46.19748611111111,
longitude: -69.87810833333333,
}
023°45'55"N 069°52'11"W {
latitude: 23.76527777777777,
longitude: -69.86972222222222,
}
-123°55'55"N 069°53'00"W {
latitude: -123.93194444444445,
longitude: -69.88333333333334,
}
123456789N23456789E {
latitude: 123.76885833333333,
longitude: 23.768858333333334,
}

See details.


Convert GeoPoint to DMS

Supported in: Batch, Faster, Streaming

Converts a GeoPoint to a geospatial coordinate string in degrees, minutes, seconds (DMS) format in accordance with a user-chosen format. Possible formats are DDD°MM'SS"H and DDDMMSSssH.

Expression categories: Geospatial

Output type: String

See details.


Convert GeoPoint to Geohash

Supported in: Batch, Faster, Streaming

Converts a GeoPoint to a base32-encoded Geohash with specified precision that contains the GeoPoint. For more information on Geohash, see: https://en.wikipedia.org/wiki/Geohash .

Expression categories: Geospatial

Output type: Geohash

See details.


Convert GeoPoint to MGRS

Supported in: Batch, Faster, Streaming

Converts a GeoPoint following the WGS84 coordinate system (which is EPSG:4326) to a MGRS (military grid reference system) coordinate. The output MGRS will follow a space-delimited format with 5 digits of precision.

Expression categories: Geospatial

Output type: MGRS

Example

Argument values:

  • Expression: geoPoint
geoPoint Output
{
 latitude -> 88.99999659707431,
 longitude -> 0.9996456505181999,
}
Z AF 01937 88990

See details.


Convert GeoPoint to geometry

Supported in: Batch, Faster, Streaming

Convert GeoPoint to a GeoJSON of type point.

Expression categories: Geospatial

Output type: Geometry

See details.


Convert H3 index to GeoPoint

Supported in: Batch, Faster, Streaming

Convert an H3 index into the GeoPoint representing the center of the corresponding H3 hexagon.

Expression categories: Geospatial

Output type: GeoPoint

See details.


Convert MGRS to GeoPoint

Supported in: Batch, Faster, Streaming

Converts a MGRS (military grid reference system) coordinate into a GeoPoint following the WGS84 coordinate system (which is EPSG:4326).

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: mgrs
mgrs Output
ZAF0193788990 {
latitude: 88.99999659707431,
longitude: 0.9996456505181999,
}

See details.


Convert a string to date

Supported in: Batch, Faster, Streaming

Returns the date given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd and yyyy-MM-dd'T'HH:mm:ss.SSSXXX. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: Date formats are optional

Argument values:

  • String: 2020-04-28
  • Formats: null

Output: 2020-04-28

See details.


Convert a string to timestamp

Supported in: Batch, Faster, Streaming

Returns the timestamp given a formatted string in accordance to the Java DateTimeFormatter. The default formats are yyyy-MM-dd'T'HH:mm:ss.SSSXXX and yyyy-MM-dd. The formats are run in order, the first matching format will be returned.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Argument values:

  • String: timestamp
  • Formats: [dd-yyyy-MM HH\:mm:ss, yyyy-MM-dd]
  • Time zone: null
timestamp Output
28-2020-04 10:09:00 2020-04-28T10:09:00Z
2020-04-28 2020-04-28T00:00:00Z

See details.


Convert base

Supported in: Batch, Streaming

Convert a number (or it string representation) from one base to another.

Expression categories: Binary, Cast, Numeric

Output type: String

Example

Argument values:

  • Expression: 4A801
  • From base: 16
  • To base: 10

Output: 305153

See details.


Convert between angle units

Supported in: Batch, Faster, Streaming

Expression categories: Geospatial, Numeric

Output type: Double

See details.


Convert between distance units

Supported in: Batch, Faster, Streaming

Expression categories: Numeric

Output type: Double

See details.


Convert between time units

Supported in: Batch, Faster, Streaming

Expression categories: Datetime

Output type: Double

See details.


Convert between weight units

Supported in: Batch, Faster, Streaming

Expression categories: Numeric

Output type: Double

See details.


Convert data to JSON

Supported in: Batch, Faster, Streaming

Transforms input into json string.

Expression categories: File, String

Output type: String

Example

Argument values:

  • Input: struct
struct Output
{
airline: {
id: NA,
},
}
{"airline":{"id":"NA"}}

See details.


Convert from Ontology GeoPoint

Supported in: Batch, Faster, Streaming

Convert an Ontology GeoPoint into a regular GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180. Regular GeoPoints are structures of the format {"longitude": {long},"latitude": {lat}}.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Expression: geopoint
geopoint Output
-20.0000000,80.0000000 {
latitude: -20.0,
longitude: 80.0,
}
38.9031000,-77.0599000 {
latitude: 38.9031,
longitude: -77.0599,
}
41.9876543,-99.1234568 {
latitude: 41.9876543,
longitude: -99.1234568,
}

See details.


Convert from hexadecimal

Supported in: Batch, Faster

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number.

Expression categories: Numeric, String

Output type: Binary

Example

Argument values:

  • Expression: string_hex
string_hex Output
68656C6C6F aGVsbG8=
3039 MDk=
FFFFFFFFFFFFCFC7 ////////z8c=
4C6F6E646F6E TG9uZG9u

See details.


Convert from hexadecimal to string

Supported in: Batch, Faster, Streaming

Inverse of hex, interprets each pair of characters as a hexadecimal number and converts to the utf-8 string of the byte representation of the number.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string_hex
string_hex Output
68656C6C6F hello
4C6F6E646F6E London

See details.


Convert geocentric coordinates to WGS 84 geodesic coordinates

Supported in: Batch, Streaming

Converts geocentric cartesian coordinates (also known as Earth-centered, Earth-fixed or ECEF coordinates) to geodesic polar coordinates. Altitude is defined as height-above-ellipsoid. If any coordinates are null, the output will be null.

Expression categories: Geospatial

Output type: GeoPoint with altitude

Example

Argument values:

  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
  • Z coordinate: z_coordinate
x_coordinate y_coordinate z_coordinate Output
0.0 6378137.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 90.0,
},
}
0.0 -6378137.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -90.0,
},
}
-6378137.0 0.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 180.0,
},
}
-6378137.0 -0.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -180.0,
},
}
0.0 0.0 6356752.314245179 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 90.0,
 longitude -> 0.0,
},
}
0.0 0.0 -6356752.314245179 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> -90.0,
 longitude -> 0.0,
},
}

See details.


Convert legacy OffsetDateTime

Supported in: Batch

Converts a legacy OffsetDateTime column to a timestamp that can be used in all Foundry pipelines. The timestamp is returned in UTC.

Expression categories: Datetime

Output type: Timestamp

See details.


Convert linestring to polygon

Supported in: Batch, Faster, Streaming

Convert a linestring geometry to a polygon geometry. This expression assumes the linestring geometry is closed. If not, the expression will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: polygon_points
polygon_points Output
{"type":"LineString","coordinates":[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]} {"type":"Polygon","coordinates":[[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]]}

See details.


Convert timestamp from UTC

Supported in: Batch, Faster, Streaming

Converts a timestamp from UTC to a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T05:09:00Z

See details.


Convert timestamp to UTC

Supported in: Batch, Faster, Streaming

Converts a timestamp to UTC based on a given time zone.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Time zone: EST
  • Timestamp: 2020-04-28T10:09:00Z

Output: 2020-04-28T15:09:00Z

See details.


Convert to Ontology GeoPoint

Supported in: Batch, Faster, Streaming

Convert a GeoPoint into a string that the Ontology will accept for a geo-indexed column (a geohash type column). Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Ontology GeoPoint

Example

Argument values:

  • Expression: point
point Output
{
latitude: -20.0,
longitude: 80.0,
}
-20.0000000,80.0000000
{
latitude: 38.9031,
longitude: -77.0599,
}
38.9031000,-77.0599000
{
latitude: 41.987654321,
longitude: -99.123456789,
}
41.9876543,-99.1234568
null null

See details.


Convert to hexadecimal

Supported in: Batch, Faster, Streaming

Computes hex value of given expression.

Expression categories: Numeric, String

Output type: String

Example

Argument values:

  • Expression: city_hex
city_hex Output
TG9uZG9u 4C6F6E646F6E

See details.


Convert to octal

Supported in: Batch, Faster, Streaming

Computes octal value of given expression.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: 12345

Output: 30071

See details.


Cosine

Supported in: Batch, Faster, Streaming

Takes the cosine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angle Output
0.0 1.0
90.0 0.0
180.0 -1.0

See details.


Create GeoPoint

Supported in: Batch, Faster, Streaming

Creates a GeoPoint column from a latitude and longitude column. Validates that the latitude parameter is between -90 and 90, inclusive, and that the longitude parameter is between -180 and 180, inclusive; if not, returns a null value.

Expression categories: Geospatial

Output type: GeoPoint

See details.


Create GeoPoint from coordinate system

Supported in: Batch, Streaming

Takes a pair of coordinates from a source coordinate system and transforms them into WGS 84 latitude/longitude values. Coordinate systems (also know as coordinate reference systems or spatial reference systems) represent different systems for identifying the location of a point on the globe and are often identified by key in standardized databases such as EPSG. If the given projection is not supported or either coordinate is null, returns null. This expression is for advanced users. It is recommended to use the "Create GeoPoint" expression if you do not need to deal with coordinate systems.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Source coordinate system: EPSG:32618
  • X coordinate: x_coordinate
  • Y coordinate: y_coordinate
x_coordinate y_coordinate Output
322190.2233952965 4306505.703879281 {
 latitude -> 38.88944258,
 longitude -> -77.05014581,
}
323243.1361536059 4318298.06539618 {
 latitude -> 38.99585379643137,
 longitude -> -77.04105678275415,
}
407063.63465300016 4764873.719585404 {
 latitude -> 43.03086518778498,
 longitude -> -76.14077251822197,
}

See details.


Create an empty array

Supported in: Batch, Faster, Streaming

Returns an empty array of the given type.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Type: String

Output: [ ]

See details.


Create array

Supported in: Batch, Faster, Streaming

Creates an array from the columns provided.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expressions: [1, 2, 3]

Output: [ 1, 2, 3 ]

See details.


Create ellipse geometry

Supported in: Batch, Streaming

Approximates an ellipse as a polygon centered at the given geo coordinate. The distance between points is computed along the surface of the WGS84 ellipsoid approximating the surface of the earth.

Expression categories: Geospatial

Output type: Geometry

See details.


Create geodesic line string

Supported in: Batch, Streaming

Creates a geodesic line between two points.

Expression categories: Geospatial

Output type: Geometry

See details.


Create geotemporal series reference

Supported in: Batch, Streaming

Generate the required values for a geotemporal series reference object property type, which consists of a reference to a series of geotemporal observations and the RID to the geotemporal series integration that contains the series.

Expression categories: Geospatial, Other, String

Output type: Geotemporal series reference

See details.


Create linestring geometry

Supported in: Batch, Streaming

Creates a GeoJSON linestring geometry from the given points.

Expression categories: Geospatial

Type variable bounds: T accepts Struct\

Output type: Geometry

Example

Argument values:

  • Points: points
points Output
[ {
latitude: 10.0,
longitude: 0.0,
}, {
latitude: 10.0,
longitude: 10.0,
} ]
{"type":"LineString","coordinates":[[0.0,10.0],[10.0,10.0]]}
[ {
latitude: 10.0,
longitude: 10.0,
}, {
latitude: 20.0,<...
{"type":"LineString","coordinates":[[10.0,10.0],[20.0,20.0],[30.0,30.0]]}
[ {
latitude: 0.0,
longitude: 179.0,
}, {
latitude: 0.0,
longitude: 181.0,
} ]
{"type":"MultiLineString","coordinates":[[[179.0,0.0],[180.0,0.0]],[[-180.0,0.0],[-179.0,0.0]]]}
[ {
latitude: 0.0,
longitude: -179.0,
}, {
latitude: 0.0,
longitude: -181.0,
} ]
{"type":"MultiLineString","coordinates":[[[180.0,0.0],[179.0,0.0]],[[-179.0,0.0],[-180.0,0.0]]]}

See details.


Create map from arrays

Supported in: Batch, Faster, Streaming

Returns a map using key-value pairs from the zipped arrays. Null values are not allowed as keys and will cause a runtime error.

Expression categories: Array, Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Array of keys: [ 1, 2, 3 ]
  • Array of values: [ 4, 5, 6 ]

Output: {
 1 -> 4,
 2 -> 5,
 3 -> 6,
}

See details.


Create null value

Supported in: Batch, Faster, Streaming

Returns a null value of the given type.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Type: String

Output: null

See details.


Create range fan geometry

Supported in: Batch, Streaming

Approximates a range fan as a polygon, specifying the region of all points whose haversine distance to the origin point is between the minimum and maximum radii, and to which the bearing from the origin is contained with the angular range centered around the specified bearing parameter. The left and right sides of the range fan are drawn as geodesic lines computed along the surface of the WGS84 ellipsoid approximating the surface of the earth. Returns null if the range spans more than 180 degrees while also crossing the anti-meridian, or if the maximum radius spans more than half of the circumference of the earth.

Expression categories: Geospatial

Output type: Geometry

See details.


Create struct column

Supported in: Batch, Faster, Streaming

Combines multiple columns into a single structured column.

Expression categories: Struct

Output type: Struct

Example

Argument values:

  • Struct elements: [tail_number, id]
tail_number id Output
MT-112 1 {
id: 1,
tail_number: MT-112,
}
XB-123 2 {
id: 2,
tail_number: XB-123,
}
PA-654 3 {
id: 3,
tail_number: PA-654,
}

See details.


Create time series reference values

Supported in: Batch, Streaming

Creates time series reference values.

Expression categories: String

Output type: String

Example

Argument values:

  • Series identifier: seriesId
  • Time series sync RID: ri.time-series-catalog.main.sync.11111111
seriesId Output
seriesOne {"seriesId":"seriesOne","syncRid":"ri.time-series-catalog.main.sync.11111111"}

See details.


Current date

Supported in: Batch, Faster, Streaming

Returns the current date of when computation started.

Expression categories: Datetime

Output type: Date

See details.


Current timestamp

Supported in: Batch, Faster, Streaming

Returns the current timestamp when computation started.

Expression categories: Datetime

Output type: Timestamp

See details.


Date sequence

Supported in: Batch, Faster

Creates an array with dates in range from start to end.

Expression categories: Datetime

Output type: Array\

Example

Argument values:

  • End date: last_planned_flight
  • Start date: first_planned_flight
  • Step unit: DAYS
  • Step size: null
first_planned_flight last_planned_flight Output
2023-01-01 2023-01-03 [ 2023-01-01, 2023-01-02, 2023-01-03 ]
2023-01-31 2023-02-02 [ 2023-01-31, 2023-02-01, 2023-02-02 ]
2023-02-28 2023-03-01 [ 2023-02-28, 2023-03-01 ]

See details.


Decode

Supported in: Batch, Faster, Streaming

Decode the given expression using the specified charset.

Expression categories: Binary, Cast

Output type: String

Example

Argument values:

  • Charset: UTF_16
  • Expression: city
city Output
/v8ATABvAG4AZABvAG4= London
/v8AQwBvAHAAZQBuAGgAYQBnAGUAbg== Copenhagen
/v8ATgBlAHcAIABZAG8AcgBr New York

See details.


Decode Geobuf as GeoJSON

Supported in: Batch, Streaming

Decode Geobuf geometry as GeoJSON.

Expression categories: Geospatial

Output type: Geometry

See details.


Divide numbers

Supported in: Batch, Faster, Streaming

Divide one number by another number.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_a col_b Output
4 2 2.0
11 2 5.5

See details.


Edit distance

Supported in: Batch, Faster, Streaming

Compute the edit distance between two strings. Supports Levenshtein, indel, and Damerau-Levenshtein distance.

Expression categories: Distance measurement, String

Output type: Double | Integer

Example

Description: String edit distance calculated using Levenshtein distance

Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
left right Output
hello hello 0
hallo hello 1
hlelo hello 2
hello hEllO 2
hello hello, world! 8
hello farewell 6

See details.


Encode GeoJSON as Geobuf

Supported in: Batch, Faster, Streaming

Encodes GeoJSON geometry as Geobuf.

Expression categories: Geospatial

Output type: Geobuf

See details.


Ends with

Supported in: Batch, Faster, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello World
  • Ignore case: true
  • Value: world

Output: true

See details.


Epoch milliseconds to date

Supported in: Batch, Faster, Streaming

Converts from epoch milliseconds to date, UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps in milliseconds to the date type

Argument values:

  • Expression: 1673964111000

Output: 2023-01-17

See details.


Epoch milliseconds to timestamp

Supported in: Batch, Faster, Streaming

Converts from epoch milliseconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps in milliseconds to the timestamp type

Argument values:

  • Expression: 1673964111000

Output: 2023-01-17T14:01:51Z

See details.


Epoch seconds to date

Supported in: Batch, Faster, Streaming

Converts from epoch seconds to date in UTC.

Expression categories: Cast, Datetime

Output type: Date

Example

Description: You can convert epoch timestamps to the date type

Argument values:

  • Expression: 1673964111

Output: 2023-01-17

See details.


Epoch seconds to timestamp

Supported in: Batch, Faster, Streaming

Converts from epoch seconds to timestamp in UTC.

Expression categories: Cast, Datetime

Output type: Timestamp

Example

Description: You can convert epoch timestamps to the timestamp type

Argument values:

  • Expression: 1673964111

Output: 2023-01-17T14:01:51Z

See details.


Equals

Supported in: Batch, Faster, Streaming

Returns true if left and right are equal.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
a b Output
1 1 true
1 0 false

See details.


Exponential

Supported in: Batch, Faster, Streaming

Calculates the exponential, e^x, of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 2.0

Output: 7.38905609893

See details.


Extract all regex matches

Supported in: Batch, Faster, Streaming

Extract all instances of a regex match into an array.

Expression categories: Regex, String

Output type: Array\

Example

Description: Extract the first two initials from each code.

Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: [ MT, XB ]

See details.


Extract audio metadata

Supported in: Batch

Extracts metadata fields from an audio file.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Format, Specification, Bytes]
Media Reference Output
{"mimeType":"audio","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
bytes: 156700,
format: audio,
specification: {
 **b...

See details.


Extract content from spreadsheets in JSON

Supported in: Batch

Extract content from all sheets a spreadsheet in JSON format.

Expression categories: Media

Output type: Map\

See details.


Extract date part

Supported in: Batch, Faster, Streaming

Extracts a part of a date like year or day of week.

Expression categories: Datetime

Output type: Integer

See details.


Extract document metadata

Supported in: Batch, Faster

Extracts metadata fields from a document.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count, Document Title]
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
author: Jane Doe,
page_count: 23,
title: Document Title,
}

See details.


Extract email body

Supported in: Batch

Extracts the email body from an email media item as either plain text or html.

Expression categories: Media

Output type: String

See details.


Extract imagery metadata

Supported in: Batch, Streaming

Extracts metadata fields from an image.

Expression categories: Media

Output type: Struct

Example

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Attributes, Bands, Bytes, Dimensions, Format, Geographic Metadata, ICC Profile, EXIF Image Location]
Media Reference Output
{"mimeType":"image/tiff","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
attributes: {
 outer_key1 -> {
 inner_key1 -> inner_value1,
},
...

See details.


Extract layout-aware content from PDF

Supported in: Batch, Faster

Extracts content from the specified document, while preserving the document's layout.

Expression categories: Media

Output type: Array\, confidence:Double>>> | Array\

See details.


Extract layout-aware content from images

Supported in: Batch, Faster

Extracts content from images, while preserving the original layout.

Expression categories: Media

Output type: Array\, confidence:Double>> | String

See details.


Extract map keys

Supported in: Batch, Faster, Streaming

Return map keys as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: K accepts AnyType

Output type: Array\

Example

Argument values:

  • Map: flight_number
flight_number Output
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ XB-134, MT-111 ]

See details.


Extract map values

Supported in: Batch, Faster, Streaming

Return map values as an array. Note the order of array elements is not deterministic.

Expression categories: Map

Type variable bounds: V accepts AnyType

Output type: Array\

Example

Argument values:

  • Map: flight_number
flight_number Output
{
 MT-111 -> 2,
 XB-134 -> 1,
}
[ 1, 2 ]

See details.


Extract offset from legacy OffsetDateTime

Supported in: Batch

Extracts the offset from a legacy OffsetDateTime column. This is the offset in seconds of the origin timezone of the timestamp from UTC timezone.

Expression categories: Datetime

Output type: Integer

Example

Argument values:

  • Expression: col_a
col_a Output
{
offset: 0,
timestamp: 2024-09-09T09:00:00.001Z,
}
0
{
offset: 19800,
timestamp: 2024-09-09T09:00:00.001Z,
}
19800
{
offset: -3600,
timestamp: 2024-09-09T09:00:00.001Z,
}
-3600

See details.


Extract table of contents from PDF

Supported in: Batch, Faster

Produces a table of contents from a PDF based on the headings used within the document.

Expression categories: Media

Output type: Array\>

Example

Argument values:

  • Media reference: Media Reference
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} [ {
level: 0,
page: 2,
title: Chapter 1,
}, {
 **l...

See details.


Extract text from PDF

Supported in: Batch, Faster

Extracts raw text from the pages in a PDF.

Expression categories: Media

Output type: Array\

Example

Argument values:

  • Media reference: Media Reference
  • End page: End Page
  • Error handling: null
  • Start page: Start Page
Media Reference Start Page End Page Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} 1 2 [ first page, second page ]

See details.


Extract text from PDF (using OCR)

Supported in: Batch, Faster

Extracts text from the pages in a PDF file using optical character recognition (OCR).

Expression categories: Media

Output type: Array\

See details.


Extract text from images (using OCR)

Supported in: Batch, Faster

Extracts text from an image using optical character recognition (OCR).

Expression categories: Media

Output type: String

See details.


Extract timestamp part

Supported in: Batch, Faster, Streaming

Extracts a part of a timestamp like year or day of week.

Expression categories: Datetime

Output type: Integer

See details.


Filter array elements

Supported in: Batch, Streaming

Filters an array based on the filter expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Array: array
  • Expression to filter:
    isNotNull(
     expression: element,
    )
array Output
[ 2, 5, null, 11 ] [ 2, 5, 11 ]

See details.


Filter by geometry type

Supported in: Batch, Faster, Streaming

Nulls any values in the geometry column that are not of the provided geometry types.

Expression categories: Geospatial

Output type: Geometry

See details.


First non null value (coalesce)

Supported in: Batch, Faster, Streaming

Picks first non null value of the inputs. Known as coalesce in sql.

Expression categories: Data preparation

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expressions: [tail_number, airline]
  • Treat empty strings as null: null
tail_number airline Output
XB-123 null XB-123
null MT MT

See details.


Floor

Supported in: Batch, Faster, Streaming

Returns floor of a given fractional value.

Expression categories: Numeric

Output type: Decimal | Long

Example

Argument values:

  • Expression: 10.123

Output: 10

See details.


Format date as string

Supported in: Batch, Faster, Streaming

Returns the date as formatted string in accordance to the Java DateTimeFormatter. The default format is ISO8601.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Date: 2022-12-20
  • Format: yy-MM-dd

Output: 22-12-20

See details.


Format number

Supported in: Batch, Faster, Streaming

Formats a number to a specific number of decimal places.

Expression categories: Cast, Numeric, String

Output type: String

Example

Description: Formats a number to 2 decimal places.

Argument values:

  • Decimal places: 2
  • Number: 1234.5678

Output: 1,234.57

See details.


Format string

Supported in: Batch, Streaming

Formats string printf style.

Expression categories: String

Output type: String

Example

Argument values:

  • Format arguments: [argument1, argument2]
  • Format string: Hello %s, my name is %s
argument1 argument2 Output
Alice Bob Hello Alice, my name is Bob
Jane John Hello Jane, my name is John

See details.


Format timestamp as string

Supported in: Batch, Faster, Streaming

Returns the timestamp as a formatted string (ISO8601 by default).

Expression categories: Cast, Datetime, String

Output type: String

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z
  • Format: yyyy-MM-dd
  • Time zone: null

Output: 2022-10-01

See details.


Geometries have intersection

Supported in: Batch, Faster, Streaming

Determines if two geometries intersect.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... {"coordinates":[[[-103.78627755867336,33.162750522563925],[-103.78627755867336,28.29724741894266],[-... true
{"coordinates":[[[0.3651446504365481,15.159518507965103],[0.3651446504365481,13.427462911044273],[3.... {"coordinates":[[[5.656394524666183,13.405417496831944],[5.656394524666183,11.29869961209053],[8.551... false

See details.


Geometry 3d affine transformation

Supported in: Batch, Faster, Streaming

Applies a three dimensional affine transformation to the input geometry. This transformation occurs in the user-provided projected coordinate system, and the result is projected back to WGS84. Two dimensional geometries will have their z-coordinates set to 0 before the affine transformation is applied. The returned geometry is three dimensional and for each coordinate [x,y,z] represents the matrix multiplication [[x0, x1, x2, x-offset], [y0, y1, y2, y-offset], [z0, z1, z2, z-offset], [0, 0, 0, 1]] * [x, y, z, 1], where the first three ordinates of the result are returned.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 0.0
  • X0: 0.0
  • X1: -1.0
  • X2: 0.0
  • Y offset: 0.0
  • Y0: 1.0
  • Y1: 0.0
  • Y2: 0.0
  • Z offset: 0.0
  • Z0: 0.0
  • Z1: 0.0
  • Z2: 0.0
geometry Output
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]]]} {"type":"Polygon","coordinates":[[[0.0, 0.0, 0.0],[0.0, 1.0, 0.0],[-1.0, 1.0, 0.0],[-1.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]}

See details.


Geometry array (unary) union

Supported in: Batch, Faster, Streaming

Given an array of geometries, combine these into a single geometry, merging without overlap.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometries Output
[ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} ] {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
[ ] null
null null

See details.


Geometry array line dissolve

Supported in: Batch, Faster, Streaming

Given an array of geometries, combine these into a linear geometry. Dissolve simplifies an input set of line-strings by removing unnecessary nodes and concatenating line-strings that can be combined. Z-coordinates will be ignored for the purpose of the dissolve operation, but the vertices in the resultant geometry will have the same z-coordinate as the corresponding points in the input.

Expression categories: Geospatial

Type variable bounds: T accepts Geometry

Output type: T

Example

Argument values:

  • Expression: geometries
geometries Output
[ {"type":"LineString","coordinates":[[0,0],[0,1],[1,1]]}, {"type":"LineString","coordinates":[[1,1]... {"type":"MultiLineString","coordinates":[[[5.0, 5.0],[4.0, 4.0],[3.0, 3.0],[2.0, 2.0],[1.0, 1.0],[0.0, 1.0],[0.0, 0.0]],[[7.0, 7.0], [6.0, 7.0], [6.0, 6.0]]]}
[ {"type":"LineString","coordinates":[[0,0,1],[0,1,1],[1,1,1]]}, {"type":"LineString","coordinates":[[1,1,1],[2,2,2]]}, {"type":"LineString","coordinates":[[1,1,2],[2,2,2],[3,3,3]]} ] {"type":"LineString","coordinates":[[0.0, 0.0, 1.0],[0.0, 1.0, 1.0],[1.0, 1.0, 1.0],[2.0, 2.0, 2.0],[3.0, 3.0, 3.0]]}

See details.


Geometry buffer

Supported in: Batch, Streaming

Computes the buffer of a geometry for both positive and negative buffer distances. Returns an approximate representation of all points within a given distance of the this geometric object (or for negative buffers, all points minus those within the buffer distance of the boundary). Buffer drops any z coordinates, and zero/negative distance buffers of lines and points will return null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Buffer distance: distance
  • Geometry column: geometry
  • Projected coordinate system: EPSG:32618
  • Buffer cap style: ROUND
  • Buffer join style: ROUND
  • Line segments per quadrant: 8
  • Single or double sided: DOUBLE_SIDED
geometry distance Output
{"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]} 10.0 {"type":"Polygon","coordinates":[[[-77.07356558299462, 38.83041048767274],[-77.07356728534256, 38.83...
{"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83042888342659, 1]]} 10.0 {"type":"Polygon","coordinates":[[[-77.07253198637027, 38.83051894052714],[-77.07250947453703, 38.83...
{"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318, 1],[-77.0725293738795,38.83... 10.0 {"type":"Polygon","coordinates":[[[-77.07379585155829, 38.83040639848026],[-77.07382199292853, 38.83...

See details.


Geometry centroid

Supported in: Batch, Streaming

Return the centroid, or "center of mass", of the geometry using a spherical approximation of the globe. If the geometry is a collection of mixed dimensions, only the elements of the highest dimension will contribute to the centroid (e.g. in a collection of points, lines and polygons, points and lines are ignored). This operation will round to 32-bit floating point precision for coordinates in the geometry.

Expression categories: Geospatial

Output type: GeoPoint

See details.


Geometry contains

Supported in: Batch, Faster, Streaming

Determines if geometry a contains geometry b. Points or lines lying on the boundary of a polygon are not contained within another geometry.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... {"type":"Point","coordinates":[-100.0,32.0]} true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... {"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]} false
{"type":"LineString","coordinates":[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323]]} {"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} false
{"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} {"type":"Point","coordinates":[-112.94377956164206,34.81725414459382]} true
{"coordinates":[[[-112.94377956164206,34.81725414459382],[-112.94377956164206,30.006795384733323], [... {"coordinates":[[[-111.94377956164206,33.81725414459382],[-111.94377956164206,31.006795384733323], [... true

See details.


Geometry difference

Supported in: Batch, Faster, Streaming

Calculates the portion of geometry a that is not intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"LineString","coordinates":[[0.0,0.0],[0.0,1.0]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See details.


Geometry explode to array

Supported in: Batch, Faster, Streaming

Converts a geometry to an array of its constituent simple geometries.

Expression categories: Geospatial

Output type: Array\

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} [ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} ]
{"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]} [ {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}, {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} ]

See details.


Geometry intersection

Supported in: Batch, Faster, Streaming

Calculates the portion of geometry a that is intersecting geometry b.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} {"type":"Polygon","coordinates":[[]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} {"type":"LineString","coordinates":[[1.0,1.0],[1.0,0.0]]}
{"type":"Point","coordinates":[0.0,0.0]} {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} {"type":"Point","coordinates":[0.0,0.0]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} {"type":"Polygon","coordinates":[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]} {"type":"LineString","coordinates":[]}

See details.


Geometry length

Supported in: Batch, Streaming

Get the length of the line strings and multi line strings in the geometry in meters. Uses a spherical approximation of the globe. Non-linear geometries (polygons and points) count as 0.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"LineString","coordinates":[[-73.778128,40.641195],[-118.408535,33.941563]]} 3974344.7433354934
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0],[1.0,1.0],[1.0,2.0]]} 333585.2407005987
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0],[1.0,1.0]], [[1.0,2.0],[2.0,2.0]]]} 333517.50194413937

See details.


Geometry rotate 2d

Supported in: Streaming

Applies a two dimensional clockwise rotation centered at the provided GeoPoint to the supplied geometry. This rotation occurs in the provided coordinate reference system and is then projected back to WGS84.

Expression categories: Geospatial

Output type: Geometry

See details.


Geometry set z-coordinate

Supported in: Batch, Faster, Streaming

Sets the z-coordinate of a geometry. If the geometry has an existing z-coordinate it will be overwritten.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: geometry
  • Z coordinate: zCoordinate
geometry zCoordinate Output
{"type":"Point","coordinates":[1.0, 2.0]} 1.0 {"type":"Point","coordinates":[1.0, 2.0, 1.0]}
{"type":"Point","coordinates":[1.0, 2.0, 3.0]} 1.0 {"type":"Point","coordinates":[1.0, 2.0, 1.0]}

See details.


Geometry shortest distance

Supported in: Batch, Streaming

Given two valid geometries, calculates the shortest (great circle) distance in meters between them. Uses a spherical approximation of the globe. Overlapping geometries have a distance of zero.

Expression categories: Geospatial

Output type: Double

See details.


Geometry standardize

Supported in: Batch, Streaming

Given a valid geometry, standardizes it by enforcing the right-hand rule on the input, which is the convention for GeoJSON. This enables equality comparisons between equivalent geometries. This expression may reverse linestrings.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"Polygon","coordinates":[[[32.26868,-26.53253],[32.26465,-26.45873],[32.25262,-26.38563],[32.26868,-26.53253]]]} {"type":"Polygon","coordinates":[[[32.25262, -26.38563],[32.26868, -26.53253],[32.26465, -26.45873],[32.25262, -26.38563]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.25,0.5]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]], [[0.25,0.25],[0.25,0.5],[0.5,0.25],[0.25,0.25]]]}
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"}
{"coordinates": [[[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]]], "type":"MultiPolygon"} {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"}
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} {"coordinates": [[5.0, 5.0],[-1.0, -1.0]], "type":"LineString"}
{"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"} {"coordinates": [[[0.0, 0.0, 0.0], [10.0, 0.0, 0.0], [10.0, 10.0, 0.0], [10.0, 10.0, 10.0], [0.0, 10.0, 10.0], [0.0, 0.0, 10.0], [0.0, 0.0, 0.0]]], "type": "Polygon"}

See details.


Geometry symmetric difference

Supported in: Batch, Faster, Streaming

Calculates the portion that is in either geometry, but not in their intersection.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[2.0,1.0],[2.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[3.0,1.0],[3.0,0.0],[1.0,0.0]]]} {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[2.0,0.0],[2.0,1.0],[3.0,1.0],[3.0,0.0],[2.0,0.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.5,0.0],[0.5,1.0],[0.0,1.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.5,1.0],[1.0,1.0],[1.0,0.0],[0.5,0.0],[0.5,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}
{"type":"Point","coordinates":[0.0,0.0]} {"type":"Point","coordinates":[1.0,1.0]} {"type":"MultiPoint","coordinates":[[0.0,0.0],[1.0,1.0]]}
{"type":"LineString","coordinates":[[0.0,0.0],[2.0,0.0]]} {"type":"LineString","coordinates":[[1.0,0.0],[3.0,0.0]]} {"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0]],[[2.0,0.0],[3.0,0.0]]]}

See details.


Geometry translate expression

Supported in: Batch, Faster, Streaming

Applies a translation to a geometry. Two dimensional geometries are only converted to three dimensional geometries if a z offset is supplied.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry column: geometry
  • Projected coordinate system: EPSG:4326
  • X offset: 1.0
  • Y offset: -1.0
  • Z offset: null
geometry Output
{"type":"Point","coordinates":[0.0, 0.0]} {"type":"Point","coordinates":[1.0, -1.0]}
{"type":"LineString","coordinates":[[0.0, 0.0], [1.0, 1.0]]} {"type":"LineString","coordinates":[[1.0, -1.0], [2.0, 0.0]]}
{"type":"Polygon","coordinates":[[[0.0, 0.0],[1.0, 0.0],[1.0, 1.0],[0.0, 1.0], [0.0, 0.0]]]} {"type":"Polygon","coordinates":[[[1.0, -1.0],[2.0, -1.0],[2.0, 0.0],[1.0, 0.0],[1.0, -1.0]]]}

See details.


Geometry union

Supported in: Batch, Faster, Streaming

Combines the two geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry a: geometry_a
  • Geometry b: geometry_b
geometry_a geometry_b Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]} {"type":"MultiPolygon","coordinates":[[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]],[[[5.0,5.0],[5.0,6.0],[6.0,6.0],[6.0,5.0],[5.0,5.0]]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]],[[0.25,0.25],[0.5,0.25],[0.5,0.5],[0.25,0.5],[0.25,0.25]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} {"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]} {"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]},{"type":"Polygon","coordinates":[[[1.0,0.0],[1.0,1.0],[2.0,1.0],[2.0,0.0],[1.0,0.0]]]}]}

See details.


Get H3 index

Supported in: Batch, Faster, Streaming

Convert GeoPoint to H3 index at given resolution. Returns null for resolution <0 or >15.

Expression categories: Geospatial

Output type: H3 Index

See details.


Get H3 indices covering a geometry

Supported in: Batch, Faster, Streaming

Convert geometry to H3 indices at a certain resolution. Resolution must be between 0 and 15, inclusive. For a polygon, three conversions are supported: a) H3 indices that fully cover the polygon, b) H3 indices that are fully contained by the polygon, c) H3 indices whose centroids are contained in the polygon. Returns null when the expected number of H3 indices exceed 7 million.

Expression categories: Geospatial

Output type: Array\

See details.


Get MIME type

Supported in:

Returns the IANA MIME type of a media reference.

Expression categories: Media

Output type: String

See details.


Get PDF page dimensions

Supported in: Batch, Faster

Get the dimensions in points of each page of the PDF.

Expression categories: Media

Output type: Array\>

See details.


Get XZ curve index of an envelope

Supported in: Batch, Streaming

Encodes the envelope in an XZ curve.

Expression categories: Geospatial

Output type: Long

Example

Argument values:

  • Curve preset: LON_LAT_10KM
  • Envelope: envelope
envelope Output
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> 0.0,
 minLon -> 1.0,
}
16777222
{
 maxLat -> 2.0,
 maxLon -> 3.0,
 minLat -> null,
 minLon -> 1.0,
}
null

See details.


Get bearing from start point to end point

Supported in: Batch, Faster, Streaming

Calculates the absolute true bearing (clockwise angle relative to geographical north) from the first point to the second point in degrees using a spherical approximation of the earth.

Expression categories: Geospatial

Output type: Double

Example

Argument values:

  • Ending point: end_point
  • Starting point: start_point
start_point end_point Output
{
latitude: 40.69325025929194,
longitude: -74.00522662934995,
}
{
latitude: 51.4988509390695,
longitude: -0.1238396067697046,
}
51.20964213763489

See details.


Get geometry envelope

Supported in: Batch, Streaming

Given a valid geometry or array of geometries, return a geometry representing the envelope of the input. The envelope is the smallest axis-aligned rectangular region containing the minimum and maximum x and y values of the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}

See details.


Get lat/long bounding box struct

Supported in: Batch, Faster, Streaming

Given a valid geometry or array of geometries, return a struct containing the bounds of the geometry or geometries.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0]]]} {
 maxLat -> 1.0,
 maxLon -> 1.0,
 minLat -> 0.0,
 minLon -> 0.0,
}

See details.


Get neighbors of an H3 index

Supported in: Batch, Faster, Streaming

Get all neighbors of an H3 index.

Expression categories: Geospatial

Output type: Array\

See details.


Get struct field

Supported in: Batch, Faster, Streaming

Extracts a field from a struct.

Expression categories: Struct

Output type: AnyType

Example

Argument values:

  • Locator: airline.id
  • Struct: struct
struct Output
{
airline: {
id: NA,
},
}
NA
{
airline: {
id: FE,
},
}
FE

See details.


Get the convex hull of a geometry

Supported in: Batch, Faster, Streaming

Given a valid GeoJSON input string, return a GeoJSON string that is the convex hull for the geometry. The convex hull is the smallest convex polygon containing the geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry
geometry Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[2.0,0.0],[2.0,1.0],[1.0,1.0],[1.0,2.0],[0.0,2.0],[0.0,0.0]]]} {"type":"Polygon", "coordinates":[[[0.0, 0.0], [0.0, 2.0], [1.0, 2.0], [2.0, 1.0], [2.0, 0.0], [0.0, 0.0]]]}
null null

See details.


Get timestamps for scene frames

Supported in: Batch

Get the timestamps and scene scores for detected scene frame transitions in the video.

Expression categories: Media

Output type: Array\>

See details.


Greater than

Supported in: Batch, Faster, Streaming

Returns true if left is greater than right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
a b Output
1 0 true
1 1 false
0 1 false

See details.


Greater than or equals

Supported in: Batch, Faster, Streaming

Returns true if left is greater than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: a
  • Right: b
a b Output
1 0 true
1 1 true
0 1 false

See details.


Greatest

Supported in: Batch, Faster, Streaming

Computes the greatest value amongst all input columns, skipping null values.

Expression categories: Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
a b c Output
1 2 3 3
1 3 2 3
3 2 1 3

See details.


Gzip decompress

Supported in: Batch, Faster, Streaming

Decompresses gzip-compressed binary into a string.

Expression categories: File

Output type: String

Example

Argument values:

  • Expression: gzip
gzip Output
H4sIAAAAAAAA//NIzcnJ11Eozy/KSVEEAObG5usNAAAA Hello, world!

See details.


H3 cell to children

Supported in: Batch, Faster, Streaming

Get children of an H3 index at given resolution specifying children coarseness. Returns null for resolution <0 or >15 or for children resolution lower than given H3 index's resolution.

Expression categories: Geospatial

Output type: Array\

See details.


H3 cell to parent

Supported in: Batch, Faster, Streaming

Get parent of an H3 index at given resolution specifying parent coarseness. Returns null for resolution <0 or >15 or resolution higher than given index.

Expression categories: Geospatial

Output type: H3 Index

See details.


H3 to geometry

Supported in: Batch, Faster, Streaming

Convert H3 index to polygon.

Expression categories: Geospatial

Output type: Geometry

See details.


Has media schema

Supported in: Batch

Checks if a media reference has a specific schema type and format. This expression can be used as a filter condition to filter media sets by media type and allow downstream schema-specific transformations.

Expression categories: Media

Output type: Boolean

See details.


Hash sha256

Supported in: Batch, Faster, Streaming

Hashes the input using sha256 hashing algorithm.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World!

Output: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

See details.


IPv6 to canonical format

Supported in: Batch

Converts an IPv6 address into a canonical IPv6 address. RFC 5952 describes canonical representations for IPv6.

Expression categories: Cyber

Output type: IPv6

Example

Argument values:

  • Expression: ip
ip Output
001:0db8:85a3:0000:0000:8a2e:0370:7334 1\:db8:85a3::8a2e:370:7334
::1 ::1
2001\:db8:0:1:1:1:1:1 2001\:db8:0:1:1:1:1:1
2001:1:0:0:10:0:10:10 2001:1::10:0:10:10
2001:0:0:1:0:0:0:1 2001:0:0:1::1
2001\:db8:0:0:1:0:0:1 2001\:db8::1:0:0:1
2001\:DB8\:AAAA\:BBBB\:CCCC\:DDDD\:EEEE:FFFF 2001\:db8\:aaaa\:bbbb\:cccc\:dddd\:eeee:ffff
0:0:0:0:0:0:0:0 ::
:: ::
0:0:0:1:2:3:4:5 ::1:2:3:4:5
1:2:3:4:5:0:0:0 1:2:3:4:5::
1:2:3:4:5:6:7:8 1:2:3:4:5:6:7:8
null null

See details.


Image to embeddings

Supported in: Batch

Converts images into embeddings using the provided model.

Expression categories: Media

Output type: Embedded vector

Example

Description: Example embeddings for an image.

Argument values:

  • Media reference: mediaRef
  • Model:
    googleSiglip2Embedding(

    )
  • Output mode: null
mediaRef Output
{
"mimeType": "image/jpeg",
"reference": {
 "type": "mediaSetViewItem",
 "...
embeddings-result

See details.


Interpolate geo point along linestring

Supported in: Batch, Streaming

Returns a point interpolated along a line. Implementation interprets lines as the shortest path, using a spherical approximation of the globe.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Fraction: fraction
  • Linestring: linestring
linestring fraction Output
{"type":"LineString","coordinates":[[0.0,2.0],[30.0,0.0]]} 0.5 {
latitude: 1.0352686301676643,
longitude: 15.004677545504547,
}
{"type":"LineString","coordinates":[[30.0,2.0],[50.0,3.0]]} 0.8 {
latitude: 2.8256098405656185,
longitude: 45.99752305664789,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0]]} 0.2 {
latitude: 8.363732883448177,
longitude: 54.073497456494955,
}

See details.


Is NaN

Supported in: Batch, Faster, Streaming

Returns true if the input is nan, false otherwise.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: NaN

Output: true

See details.


Is empty struct

Supported in: Batch, Streaming

Returns true if the input is an empty struct, with recursive checking of inner arrays and structs.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: struct
struct Output
{
airline: {
id: null,
name: null,
},
tail_no: null,
}
true
{
airline: {
id: NA,
name: null,
},
tail_no: null,
}
false

See details.


Is in

Supported in: Batch, Faster, Streaming

Returns true if the list contains the value.

Expression categories: Boolean

Type variable bounds: T accepts ComparableType

Output type: Boolean

Example

Description: You can check if the list contains the value.

Argument values:

  • Contains: [AWE-112, BRR-123]
  • Value: value
value Output
BRR-123 true
ABC-543 false

See details.


Is not null

Supported in: Batch, Faster, Streaming

Returns true if the input is not null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: hello
  • Treat empty strings as null: null

Output: true

See details.


Is null

Supported in: Batch, Faster, Streaming

Returns true if the input is null, can optionally treat empty strings as null.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: null
  • Treat empty strings as null: null

Output: true

See details.


Is valid GeoJSON

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid GeoJSON input string. Not all GeoJSON strings are indexable by the ontology; use the "prepare geometry" expression to prepare geometry prior to Ontology use.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geoJson
geoJson Output
{"type":"Point","coordinates":[3.0, 5.0, 2.0]} true
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]} true
{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]} true
not a GeoJSON string false

See details.


Is valid Geohash

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid Geohash input string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geohash
geohash Output
sk4d true
dt9zy9cg36j7 true
not a Geohash string false
null false

See details.


Is valid H3 index

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid H3 index string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: h3
h3 Output
862a1072fffffff true
not an h3 value false

See details.


Is valid IPv4

Supported in: Batch

Returns true if the input is a valid IPv4 address.

Expression categories: Cyber

Output type: Boolean

Example

Argument values:

  • Expression: ip
ip Output
192.168.1.1 true
10.0.0.1 true
172.16.0.1 true
255.255.255.255 true
0.0.0.0 true
127.0.0.1 true
1.2.3.4 true
256.1.1.1 false
192.168.1.256 false
192.168.1 false
192.168.1.1.1 false
abc.def.ghi.jkl false
192.168.1.a false
-1.2.3.4 false
empty string false
false
192.168.1.0/24 false
10.0.0.0/8 false
192 false
a.b.c.d/255.0.0.0 false
::1 false
2001\:db8::1 false
null false

See details.


Is valid IPv6

Supported in: Batch

Returns true if the input is a valid IPv6 address.

Expression categories: Cyber

Output type: Boolean

Example

Argument values:

  • Expression: ip
ip Output
001:0db8:85a3:0000:0000:8a2e:0370:7334 true
2001\:db8:85a3:0:0:8A2E:0370:7334 true
2001\:db8:85a3::8a2e:370:7334 true
::1 true
fe80:: true
:: true
0:0:0:0:0:0:0:1 true
2001\:db8:: true
::ffff:192.0.2.128 true
2001\:db8:0:0:1:0:0:1 true
1234:5678:9abc\:def0:1234:5678:9abc:def0 true
abcd\:ef01:2345:6789\:abcd\:ef01:2345:6789 true
2001\:db8:1234:0000:0000:0000:0000:0001 true
2001\:db8:1234::1 true
2001\:db8:85a3::8a2e:37023:7334 false
2001\:db8:85a3::8a2e::7334 false
2001\:db8:85a3:0:0:8A2E:0370:7334:1234 false
2001\:db8:85a3 false
2001\:db8:85a3::8a2e:370g:7334 false
::ffff:192.0.2.999 false
2001\:db8:85a3:0:0:8A2E:0370:7334: false
:2001\:db8:85a3:0:0:8A2E:0370:7334 false
2001\:db8:85a3:0:0:8A2E:0370:7334:: false
GGGG\:db8:85a3:0:0:8A2E:0370:7334 false
2001-db8-85a3-0-0-8A2E-0370-7334 false
2001\:db8:85a3:0:0:8A2E:0370:7334/64 false
2001\:db8::/32 false
empty string false
false
192.168.1.1 false
null false

See details.


Is valid MGRS

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid MGRS (military grid reference system) string.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: mgrs
mgrs Output
4Q FJ 1 6 true
4Q FJ 12345 67890 true

See details.


Is valid MIME type

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid MIME type.

Expression categories: Boolean, Other

Output type: Boolean

See details.


Is valid Ontology GeoPoint

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid Ontology GeoPoint. Ontology GeoPoints are strings of the format '{lat},{lon}', where -90 <= lat <= 90 and -180 <= lon <= 180.

Expression categories: Geospatial

Output type: Boolean

Example

Argument values:

  • Expression: geopoint
geopoint Output
-35.307428203,149.122686883 true
149.122686883,-35.307428203 false
10.0, 20.0 true
10.0, 20.0 true
not a GeoPoint false
null false
(10.0,20.0) false

See details.


Is valid delegated media gid

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid gotham delegated media gid. Check gotham's delegated media rtfm for more details.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: ri.gotham-delegated-media.12345678-1234-1234-1234-123456789012.testaudiotype.testlocator

Output: true

See details.


Is valid media reference

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid Foundry media reference.

Expression categories: Boolean

Output type: Boolean

See details.


Is valid rid

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid Foundry resource identifier.

Expression categories: Boolean

Output type: Boolean

See details.


Is valid uuid

Supported in: Batch, Faster, Streaming

Returns true if the input is a valid uuid.

Expression categories: Boolean

Output type: Boolean

See details.


Join array

Supported in: Batch, Faster, Streaming

Joins array with specified separator.

Expression categories: Array

Output type: String

Example

Argument values:

  • Array to join: [ hello, world ]
  • Separator: -

Output: hello-world

See details.


Last day of the week/month/quarter/year

Supported in: Batch, Faster

Returns the last day of the week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See details.


Least

Supported in: Batch, Faster, Streaming

Computes the least value amongst all input columns, skipping null values.

Expression categories: Boolean, Numeric

Type variable bounds: T accepts ComparableType

Output type: T

Example

Argument values:

  • Expressions: [a, b, c]
a b c Output
1 2 3 1
1 3 2 1
3 2 1 1

See details.


Left of string

Supported in: Batch, Faster, Streaming

Extract left hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 5

Output: Hello

See details.


Left pad string

Supported in: Batch, Faster, Streaming

Left-pad the string column to width of length with pad.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: ***Hello world!

See details.


Length

Supported in: Batch, Faster, Streaming

Returns the length of each value in a string column or an array column.

Expression categories: Array, Numeric

Output type: Integer

Example

Argument values:

  • Expression: string
string Output
hello 5
bye 3

See details.


Less than

Supported in: Batch, Faster, Streaming

Returns true if left is less than right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: left
  • Right: right
left right Output
1.0 10 true
10.0 1 false

See details.


Less than or equals

Supported in: Batch, Faster, Streaming

Returns true if left is less than or equal to right.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Left: left
  • Right: right
left right Output
1.0 10 true
10.0 1 false

See details.


Logarithm

Supported in: Batch, Faster, Streaming

Calculates the natural logarithm, ln(x), of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 10.123

Output: 2.3148100626166146

See details.


Logarithm with base

Supported in: Batch, Faster, Streaming

Calculates logarithm with a given base.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Base: 2.0
  • Expression: 8

Output: 3.0

See details.


Logical type cast

Supported in: Batch, Faster, Streaming

Cast expression to given logical type. Unlike the regular cast expression, this expression will not change the underlying base representation of the data, but rather enforce the constraints associated with the specified logical type, so that the output can be used as the input to downstream expressions which specifically demand an instance of that logical type.

Expression categories: Cast

Type variable bounds: C accepts AnyType

Output type: C

Example

Description: Successful cast to natural number

Argument values:

  • Expression: 1234
  • Logical type: Natural number
  • Default value: null

Output: 1234

See details.


Lowercase

Supported in: Batch, Faster, Streaming

Converts all characters in string to lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello World

Output: hello world

See details.


Map values

Supported in: Batch, Faster, Streaming

Changes the values of the input column to new values based on a map of key-value pairs. If the input value is not found in the map, the default value is used.

Expression categories: Data preparation

Type variable bounds: T1 accepts ComparableType**T2 accepts AnyType

Output type: T2

Example

Argument values:

  • Column to replace values in: country
  • Default value:
    cast(
     expression: null,
     type: String,
    )
  • Values map: {
     Denmark -> DNK,
     United Kingdom -> UK,
    }
country Output
United Kingdom UK
Denmark DNK
United States of America null

See details.


Modulo

Supported in: Batch, Faster, Streaming

Returns modulus of an expression.

Expression categories: Numeric

Output type: DefiniteNumeric

Example

Argument values:

  • Denominator: 4
  • Numerator: 10.123

Output: 2.123

See details.


Multiply numbers

Supported in: Batch, Faster, Streaming

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions: [col_a, col_b, col_c]
col_a col_b col_c Output
10 2 3 60

See details.


Natural random number

Supported in: Batch, Faster, Streaming

Returns a random natural number. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Long

Example

Description: The only natural number between 10 (inclusive) and 11 (exclusive) is 10.

Argument values:

  • Max value: 11
  • Min value: 10
  • Seed: null

Output: 10

See details.


Negate

Supported in: Batch, Faster, Streaming

Expression categories: Numeric

Output type: Numeric

See details.


Next day

Supported in: Batch

Returns the first date which is later than the value of the date column based on the day of week argument.

Expression categories: Datetime

Output type: Date

Example

Description: Next Monday after Wednesday January 10, 2024

Argument values:

  • Date: 2024-01-10
  • Day of the week: MONDAY

Output: 2024-01-15

See details.


Normal random number

Supported in: Batch, Faster, Streaming

Returns a column of normally distributed random numbers with zero mean and unit variance. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See details.


Not

Supported in: Batch, Faster, Streaming

Returns the negated boolean value of a boolean expression.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Expression: boolean
boolean Output
true false
false true

See details.


Not any

Supported in: Batch, Streaming

Returns true only if all of the specified conditions are false. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_boolean right_boolean Output
true true false
true false false
false true false
false false true

See details.


Nth chain in polygon

Supported in: Batch, Faster, Streaming

Returns the nth ring in a single polygon in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. An index equal to 1 returns an external ring. An index greater than 1 returns an internal ring. Returns null for any of the following conditions: geometry isn't a single polygon, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • N: n
  • Polygon: polygon
polygon n Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 1 {"coordinates": [[0.0, 0.0], [0.0, 10.0], [10.0, 10.0], [10.0, 0.0], [0.0, 0.0]], "type": "LineString"}
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 2 null
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"} 1 {"coordinates": [[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]], "type": "LineString"}
{"coordinates":[[[60.0,60.0],[50.0,60.0],[50.0,50.0],[60.0,50.0],[60.0,60.0]],[[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]]],"type":"Polygon"} 2 {"coordinates": [[57.0,57.0],[55.0,52.0],[52.0,52.0],[50.0,57.0],[57.0,57.0]], "type": "LineString"}

See details.


Nth point in linestring

Supported in: Batch, Faster, Streaming

Returns the nth point in a single linestring in the geometry. Indexing is 1-based, and an index of 0 is out-of-bounds. A negative index is counted backwards from the end of the linestring, so that -1 is the last point. Returns null for any of the following conditions: geometry isn't a single linestring, a feature collection or geometry collection is provided, index is out-of-bounds, or at least one argument is null.

Expression categories: Geospatial

Output type: GeoPoint

Example

Argument values:

  • Linestring: linestring
  • N: n
linestring n Output
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]} 1 {
latitude: 2.0,
longitude: 30.0,
}
{"type":"LineString","coordinates":[[30.0,2.0],[35.0,0.0],[50.0,3.0]]} 3 {
latitude: 3.0,
longitude: 50.0,
}
{"type":"LineString","coordinates":[[45.0,9.0],[90.0,4.0],[40.0,0.0]]} -1 {
latitude: 0.0,
longitude: 40.0,
}

See details.


Nullify empty string

Supported in: Batch, Faster, Streaming

Convert empty strings to null.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: empty string

Output: null

See details.


Or

Supported in: Batch, Faster, Streaming

Returns true if any of the specified conditions are true. Nulls are considered false.

Expression categories: Boolean

Output type: Boolean

Example

Argument values:

  • Conditions: [left_boolean, right_boolean]
left_boolean right_boolean Output
true true true
true false true
false true true
false false false

See details.


Parse GeoJSON from a non-WGS 84 coordinate system

Supported in: Batch, Faster, Streaming

Convert GeoJSON string from a non-WGS 84 coordinate system to WGS 84 geometry. For GeoJSON already in WGS 84 (longitude, latitude), the "logical type cast" expression can convert directly with less overhead. Returns null for strings that fail during parsing or conversion.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoJSON string: geojson_string
  • Source coordinate system: EPSG:32618
geojson_string Output
{"type":"Point","coordinates":[320000.0,4300000.0]} {"type":"Point","coordinates":[-77.07368071728229,38.83040844313318]}
{"type":"LineString","coordinates":[[320000.0,4300000.0],[320100.0,4300000.0]]} {"type":"LineString","coordinates":[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659]]}
{"type":"Polygon","coordinates":[[[320000.0,4300000.0],[320100.0,4300000.0],[320000.0,4300100.0],[320000.0,4300000.0]]]} {"type":"Polygon","coordinates":[[[-77.07368071728229,38.83040844313318],[-77.0725293738795,38.83042888342659],[-77.07370685720375,38.83130901341597],[-77.07368071728229,38.83040844313318]]]}

See details.


Parse JSON string

Supported in: Batch, Faster, Streaming

Parses JSON string following the given schema definition, ignoring any fields not in the schema.

Expression categories: Data preparation, Popular, String, Struct

Output type: Array\ | Map\ | Struct

Example

Argument values:

  • JSON: json
  • Schema: Struct\>
  • Output mode: null
json Output
{
 "airline": "XB-112",
 "airport": {
  "id": "JFK",
  "miles": 2000
 }
}
{
airline: XB-112,
airport: {
id: JFK,
miles: 2000,
},
}

See details.


Parse KML string as geometry

Supported in: Batch, Streaming

Parses KML geometry definitions as a GeoJSON. Ignores all attributes. This expression operates on already extracted text; please extract files to text before using this expression.

Expression categories: Geospatial

Output type: String | Struct\

Example

Description: Basic polygons.

Argument values:

  • KML string to parse: col
  • Output mode: null
  • Prepare geometry after parse: null
col Output
\
\
-71.1663,42.2614
-71.1667,42.2616
\

\
{"type":"LineString","coordinates":[[-71.1663,42.2614],[-71.1667,42.2616]]}
\
\1\
\relativeToGround\
\<ou...
{"type":"Polygon","coordinates":[[[-122.0848938459612,37.42257124044786,17.0],[-122.0847882750515,37...
\
\1\
\relativeToGround\
\<ou...
{"type":"Polygon","coordinates":[[[-77.05465973756702,38.87291016281703,100.0],[-77.0531553685479,38...
\
\
-71.1663,42.2614
\

\
{"type":"Point","coordinates":[-71.1663,42.2614]}
\
\
\
\ -71.1663,42.2614
-71.1...
{"type":"MultiPolygon","coordinates":[[[[-81.1679,32.2614],[-81.1679,32.28],[-81.1663,32.28],[-81.16...

See details.


Parse KML string as geometry list

Supported in: Batch, Streaming

Parses KML string as a list of GeoJSONs, ignoring all KML attributes.

Expression categories: Geospatial

Output type: Array\ | Struct\>, error:String>

Example

Argument values:

  • KML string to parse: col
  • Output mode: simple
  • Prepare geometry after parse: true
col Output
\
\
\<Do...
[ {"coordinates":[[-122.43193945401, 37.801983684521], [-122.431564131101, 37.8020327731402], [-122.43... ]

See details.


Parse XML as schema

Supported in: Batch, Streaming

Parses xml strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: File, Struct

Output type: Struct

Example

Argument values:

  • Input schema: Struct\>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Output mode: SIMPLE
  • Value tag: null
xml Output
\
 \XB-112\
 \
  \JFK\
  \2000\
 \

\
{
airport: {
id: JFK,
miles: 2000,
},
id: XB-112,
}

See details.


Parse classification string

Supported in: Batch, Streaming

Returns the markings parsed from a given classification string. This output is formatted as a struct, where the first element of the struct is an array comprising the classification markings that represent the input. This list is null if the classification string is invalid, or if there are other errors that occur while parsing the markings. The second element of the struct is the string of error message(s). If there are no errors, the error field will be null. This expression is called asynchronously for performance.

Expression categories: Other

Output type: Struct\, errors:String>

See details.


Parse duration

Supported in: Batch

Parses an ISO8601 string duration and start time to its length in a specific time unit.

Expression categories: Datetime, String

Output type: Long

Example

Argument values:

  • Duration: PT1M30.5S
  • Start time: 2022-10-01T09:00:00Z
  • Unit: SECONDS

Output: 90

See details.


Parse phone number

Supported in: Batch, Streaming

Normalizes phone numbers to a common format, parsing them from various regions and formats. Phone numbers containing the + sign followed by the region code will be parsed correctly even if the region is not set. All other number formats require a region to be selected from the options provided in order for them to be correctly parsed. Phone numbers that cannot be parsed will result in nulls.

Expression categories: String

Output type: Phone Number

Example

Description: Return formatted US phone number

Argument values:

  • Expression: phoneNumber
  • Format: E164
  • Region: US
phoneNumber Output
(234) 235-5678 +12342355678
+1 415 5552671 +14155552671
(415) 5552671 +14155552671
Whatsapp@14155552671 +14155552671

See details.


Parse semantic version

Supported in: Batch, Streaming

Parses a semantic version string into a logical type. Supports both release versions (e.g., "0.987.0") and versions with prerelease metadata (e.g., "0.987.0-16-gb3fb285"). Returns null for strings that do not match the expected format.

Expression categories: String

Output type: Semantic Version

Example

Argument values:

  • Version string: version
version Output
0.987.0-16-gb3fb285 {
 major -> 0,
 minor -> 987,
 patch -> 0,
 prerelease -> [ 16-gb3fb285 ],
}
1.0.0-0-g0000000 {
 major -> 1,
 minor -> 0,
 patch -> 0,
 prerelease -> [ 0-g0000000 ],
}
2.5.3-42-gabc1234 {
 major -> 2,
 minor -> 5,
 patch -> 3,
 prerelease -> [ 42-gabc1234 ],
}
0.987.0-SNAPSHOT {
 major -> 0,
 minor -> 987,
 patch -> 0,
 prerelease -> [ SNAPSHOT ],
}

See details.


Parse well known binary as geometry

Supported in: Batch, Faster, Streaming

Converts well-known binary (WKB) to geometry logical type. Invalid WKB input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKB is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkb
  • Source coordinate system: null
wkb Output
AAAAAAFACAAAAAAAAEAUAAAAAAAA {"type":"Point","coordinates":[3.0, 5.0]}
AIAAAAFACAAAAAAAAEAUAAAAAAAAQAAAAAAAAAA= {"type":"Point","coordinates":[3.0, 5.0, 2.0]}
AAAAAAMAAAABAAAABAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
AAAAAAIAAAACAAAAAAAAAAAAAAAAAAAAAD/wAAAAAAAAAAAAAAAAAAA= {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See details.


Parse well known text as geometry

Supported in: Batch, Faster, Streaming

Converts well-known text (WKT) string to geometry logical type. Invalid WKT input will be returned as null. Optionally supply a source coordinate system identifier to convert from the source coordinate system to WGS 84 if the WKT is not in WGS 84 already.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: wkt
  • Source coordinate system: null
wkt Output
POINT (3.0 5.0 2.0) {"type":"Point","coordinates":[3.0, 5.0, 2.0]}
POLYGON ((0.0 0.0, 1.0 0.0, 0.0 1.0, 0.0 0.0)) {"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]]]}
LINESTRING (0.0 0.0, 1.0 0.0) {"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}

See details.


Perimeter

Supported in: Batch, Streaming

Calculates perimeter of a geometry in meters using a spherical approximation of the globe. For a line string or a point, this equals 0.

Expression categories: Geospatial

Output type: Double

See details.


Positive modulo

Supported in: Batch, Faster

Returns positive modulus of an expression.

Expression categories: Numeric

Type variable bounds: T1 accepts Byte | Integer | Long | Short**T2 accepts Byte | Integer | Long | Short

Output type: T1

Example

Argument values:

  • Denominator: 3
  • Numerator: 10

Output: 1

See details.


Power of

Supported in: Batch, Faster, Streaming

Calculates power of expression to exponent. If any of the values is null, returns null.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Exponent: 3
  • Expression: 10

Output: 1000.0

See details.


Prepare geometry

Supported in: Batch, Streaming

Prepares a geometry for downstream use, for example indexing to the ontology, by converting a geometry string into valid GeoJSON. Polygons will be closed and deduplicated. Geometries which cross the anti-meridian (as indicated by width > 180 degrees) will be split into multiple features on each side of the anti-meridian. By default, this operation will return the converted geometry, or null if the string cannot be converted. Alternatively, in the "show errors" output mode, this operation will instead output a struct containing either the successfully parsed output or a descriptive error message.

Expression categories: Geospatial

Output type: Geometry | Struct\

Example

Argument values:

  • Geometry string: geometry
  • Output mode: null
geometry Output
{"type":"Polygon","coordinates":[[[0.0,0.0],[10.0,0.0],[10.0,10.0],[0.0,10.0],[0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[1.0,0.0,1.0],[0.0,1.0,1.0],[0.0,0.0,1.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0,1.0],[0.0,1.0,1.0],[1.0,0.0,1.0],[0.0,0.0,1.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [0.0,1.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.0,0.0],[1.0,0.0], [1.0,0.0], [0.0,1.0], [0.0,0.0]]]} {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[179.0,-30.0],[-179.0,-30.0],[-179.0,30.0],[179.0,30.0],[179.0,-30]]]} {"type":"MultiPolygon","coordinates":[[[[-180.0,-30.0],[-180.0,30.0],[-179.0,30.0],[-179.0,-30.0],[-180.0,-30.0]]],[[[180.0,30.0],[180.0,-30.0],[179.0,-30.0],[179.0,30.0],[180.0,30.0]]]]}
{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]} {"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]... {"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[40.0,10.0],[0.0,1.0]...
{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"LineString","coordinates":[[179.0,30.0],[-179.0,30.0]]}]} {"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[1.0,0.0]},{"type":"MultiLineString","coordinates":[[[179.0,30.0],[180.0,30.0]],[[-180.0,30.0],[-179.0,30.0]]]}]}
{"type":"GeometryCollection","geometries":[{"type":"LineString","coordinates":[[0.0,0.0],[1.0,0.0]]}... {"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0]],[[1.0,1.0],[2.0,1.0]]]}
{"type":"GeometryCollection","geometries":[{"type":"MultiLineString","coordinates":[[[0.0,0.0],[1.0,0.0]],[],[[1.0,1.0],[2.0,1.0]]]},{"type":"MultiPoint","coordinates":[[0.0,0.0],[1.0,1.0]]}]} {"geometries":[{"coordinates":[[[0.0,0.0],[1.0,0.0]],[[1.0,1.0],[2.0,1.0]]],"type":"MultiLineString"},{"coordinates":[[0.0,0.0],[1.0,1.0]],"type":"MultiPoint"}],"type":"GeometryCollection"}
{"type":"MultiPolygon","coordinates":[[[[1.0,1.0],[2.0,1.0],[2.0,2.0],[1.0,2.0],[1.0,1.0]]],[[]],[[[10.0,10.0],[20.0,10.0],[20.0,20.0],[10.0,20.0],[10.0,10.0]]]]} {"type":"MultiPolygon","coordinates":[[[[1.0,2.0],[2.0,2.0],[2.0,1.0],[1.0,1.0],[1.0,2.0]]],[[[10.0,20.0],[20.0,20.0],[20.0,10.0],[10.0,10.0],[10.0,20.0]]]]}

See details.


Reduce array elements

Supported in: Batch, Streaming

Reduces array elements using an expression.

Expression categories: Array

Type variable bounds: T accepts Array\ | Short | String | Timestamp> | Boolean | Byte | Date | Double | Float | Integer | Long | Map\ | Short | String | Timestamp

Output type: T

Example

Argument values:

  • Array: miles
  • Expression to reduce:
    add(
     expressions: [accumulator, element],
    )
  • Initial value: 0
miles Output
[ 12300, 12342 ] 24642

See details.


Regex extract

Supported in: Batch, Faster, Streaming

Extracts the specified group from a regex. Returns empty string when no match is found.

Expression categories: Regex, String

Output type: String

Example

Description: Extract the first two initials from the first match.

Argument values:

  • Expression: MT-112, XB-967
  • Group: 1
  • Pattern: (\w\w)(-)

Output: MT

See details.


Regex find

Supported in: Batch, Faster, Streaming

Matches an expression against a regular expression. Regular expression can match any part of the string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can find regex patterns.

Argument values:

  • Expression: abcdefg
  • Regex: abc?d

Output: true

See details.


Regex index

Supported in: Batch, Faster, Streaming

Returns an array of indices (counted as Unicode code points) at which the regular expression pattern is found in the given expression.

Expression categories: Regex, String

Output type: Array\

Example

Description: You can find regex patterns and their indices.

Argument values:

  • Expression: ababab
  • Regex: ab

Output: [ 0, 2, 4 ]

See details.


Regex match

Supported in: Batch, Faster, Streaming

Matches an expression against a regular expression. Regular expression must match the whole string.

Expression categories: Regex, String

Output type: Boolean

Example

Description: You can match regex patterns

Argument values:

  • Expression: abcdefg
  • Regex: abc?d.+

Output: true

See details.


Regex replace

Supported in: Batch, Faster, Streaming

Replace a string using a regex pattern.

Expression categories: Regex, String

Output type: String

Example

Argument values:

  • Expression: tail_number
  • Pattern: (\w\w)(-)
  • Replace: **-
tail_number Output
MT-123 **-123
XB-434 **-434
MT-123, XB-434 **-123, **-434

See details.


Remove map entry by key

Supported in: Batch, Streaming

Removes a map entry by the given key.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Key: k
  • Map: map_col
map_col Output
{
 a -> 1,
 k -> 2,
}
{
 a -> 1,
}

See details.


Rename struct field

Supported in: Batch, Faster, Streaming

Rename fields within a struct.

Expression categories: Data preparation, Struct

Output type: Struct

Example

Argument values:

  • Expression: struct
  • Renames: [(airline.id, identifier)]
struct Output
{
airline: {
id: NA,
},
}
{
airline: {
identifier: NA,
},
}
{
airline: {
id: FE,
},
}
{
airline: {
identifier: FE,
},
}

See details.


Right of string

Supported in: Batch, Faster, Streaming

Extract right hand side of a string based on index.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 6

Output: world!

See details.


Right pad string

Supported in: Batch, Faster, Streaming

Right-pad the string column to width of length with pad. If the length of the string is greater than the length provided, it will be trimmed.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: Hello world!
  • Length: 15
  • Pad: *

Output: Hello world!***

See details.


Round number

Supported in: Batch, Faster, Streaming

Round number to 'scale' decimal places.

Expression categories: Numeric

Output type: Decimal | Double | Float

Example

Argument values:

  • Column: 10.123
  • Scale: 2

Output: 10.12

See details.


Secant

Supported in: Batch, Faster, Streaming

Takes the secant of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angle Output
0.0 1.0
90.0 1.633123935319537E16
180.0 -1.0

See details.


Sentence case

Supported in: Batch, Faster, Streaming

Converts the first character of the first word to be uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello world

See details.


Sequence

Supported in: Batch, Faster, Streaming

Creates an array with numbers in range from start to end.

Expression categories: Array

Type variable bounds: T accepts Byte | Integer | Long | Short

Output type: Array\

Example

Description: Sequences increase by 1 unless otherwise specified.

Argument values:

  • End: 10
  • Start: 0
  • Step size: null

Output: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

See details.


Similarity score

Supported in: Batch

Returns the similarity score of two embedding vectors.

Expression categories: Distance measurement, Numeric

Type variable bounds: T accepts Array\

Output type: Double

See details.


Simplify geometry

Supported in: Batch, Faster, Streaming

This expression simplifies GeoJSON geometry by removing points within the given tolerance distance using a spherical model of the globe. Loops smaller than the tolerance may be removed entirely.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Geometry: Geometry
  • Tolerance: Tolerance
  • Coordinate precision: null
Geometry Tolerance Output
{"type":"LineString","coordinates":[[30.0,0.0],[35.0,0.0],[40.0,0.0]]} 1000 {"type":"LineString","coordinates":[[30.0,0.0],[40.0,0.0]]}
{"type":"Polygon","coordinates":[[[-1.0,-1.0],[1.0,-1.0],[1.0,1.0],[0.0,1.0],[-1.0,1.0],[-1.0,-1.0]]]} 12000 {"type":"Polygon","coordinates":[[[-1.0,1.0],[1.0,1.0],[1.0,-1.0],[-1.0,-1.0],[-1.0,1.0]]]}
{"type":"MultiLineString","coordinates":[[[0.0,0.0],[5.0,0.1],[10.0,0.0]], [[0.0,-5.0],[5.0,0.1],[10.0,5.0]]]} 12000 {"type":"MultiLineString","coordinates":[[[0.0,0.0],[10.0,0.0]],[[0.0,-5.0],[10.0,5.0]]]}
{"type":"MultiPolygon","coordinates":[[[[-2.0,-2.0],[2.0,-2.0],[2.0,2.0],[0.0,2.1],[-2.0,2.0],[-2.0,... 12000 {"type":"MultiPolygon","coordinates":[[[[-2.0,2.0],[2.0,2.0],[2.0,-2.0],[-2.0,-2.0],[-2.0,2.0]], [[1...

See details.


Sine

Supported in: Batch, Faster, Streaming

Takes the sine of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angle Output
0.0 0.0
90.0 1.0
180.0 0.0

See details.


Skip bytes

Supported in: Batch, Faster, Streaming

Skip a given number of bytes in a binary column.

Expression categories: Binary

Output type: Binary

Example

Argument values:

  • Bytes: aGk=
  • Number of bytes to skip: 1

Output: aQ==

See details.


Slice array

Supported in: Batch, Faster, Streaming

Returns the array sliced from the first position to the second position. First position must be 1 or higher. If second position is longer than the array, the entire rest of the array will be returned.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

See details.


Soundex

Supported in: Batch, Faster

Compute the soundex encoding (a phonetic representation) for a word.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: input_string
input_string Output
cat C300
caat C300
two T000
too T000
to T000
four F600
for F600
fore F600
fur F600
meow M000
me ow M000

See details.


Split string

Supported in: Batch, Faster, Streaming

Split string on specified regex pattern.

Expression categories: String

Output type: Array\

Example

Argument values:

  • Expression: string
  • Pattern:
  • Limit: 2
string Output
hello [ hello ]
hello world [ hello, world ]
hello there world [ hello, there world ]

See details.


Square root

Supported in: Batch, Faster, Streaming

Calculates the square root of a column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: 9.0

Output: 3.0

See details.


Starts with

Supported in: Batch, Faster, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: Hello world
  • Ignore case: true
  • Value: hello

Output: true

See details.


String after delimiter

Supported in: Batch, Faster, Streaming

Extract the string after the first delimiter. Return full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: world

See details.


String before delimiter

Supported in: Batch, Faster, Streaming

Extract the string before the first delimiter. Return the full string if no matches are found.

Expression categories: String

Output type: String

Example

Argument values:

  • Delimiter: hello
  • Expression: ... Hello world
  • Ignore case: true

Output: ...

See details.


String contains

Supported in: Batch, Faster, Streaming

Expression categories: Boolean, String

Output type: Boolean

Example

Argument values:

  • Expression: ... Hello world
  • Ignore case: true
  • Value: hello

Output: true

See details.


Substring

Supported in: Batch, Faster, Streaming

Extract substring.

Expression categories: Numeric

Output type: String

Example

Argument values:

  • Expression: string
  • Start: start
  • Length: length
string start length Output
hello, world 1 5 hello
hello, world 8 5 world
hello, world -5 5 world

See details.


Subtract multiple expressions

Supported in: Batch, Faster, Streaming

Calculates the difference between a number and all input columns.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Expressions list: [col_b, col_c]
  • Value to be subtracted: col_a
col_a col_b col_c Output
5 3 2 0
2 4 0 -2
-2 -4 -2 4

See details.


Subtract numbers

Supported in: Batch, Faster, Streaming

Subtract one number from another number.

Expression categories: Numeric

Output type: Numeric

Example

Argument values:

  • Left: col_a
  • Right: col_b
col_a col_b Output
32 4 28
-5 -3 -2

See details.


Subtract value from date

Supported in: Batch, Faster, Streaming

Returns the date that is 'value' days/weeks/months/quarter/years before 'start'.

Expression categories: Datetime

Output type: Date

Example

Argument values:

  • Date: 2022-04-05
  • Unit: DAYS
  • Value: 2

Output: 2022-04-03

See details.


Sum of array elements

Supported in: Batch, Faster, Streaming

Sums the elements contained within the array.

Expression categories: Array

Type variable bounds: T accepts DefiniteNumeric

Output type: T

Example

Argument values:

  • Expression: [ 1, 2, 3 ]
  • Treat null as zero: true

Output: 6

See details.


Tangent

Supported in: Batch, Faster, Streaming

Takes the tangent of an angle.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Angle unit: degrees
  • Angle value: angle
angle Output
0.0 0.0
90.0 1.633123935319537E16
180.0 0.0

See details.


Text segmentation

Supported in: Batch, Faster, Streaming

Extract a series of text segments using sliding window segmentation.

Expression categories: String

Output type: Array\

See details.


Text to embeddings

Supported in: Batch

Converts text into embeddings.

Expression categories: String

Output type: Embedded vector

Example

Description: Example embeddings for the word 'palantir'.

Argument values:

  • Model:
    ada002Embedding(

    )
  • Text column: text
  • Output mode: null
text Output
palantir [ -0.019182289, -0.02127992, 0.009529043, -0.008066221, -0.0014429842, 0.019154688, -0.023556953, -0...

See details.


Timestamp add

Supported in: Batch, Faster, Streaming

Add value to timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-01T00:00:00Z
  • Unit: MILLISECONDS
  • Value to add: 2

Output: 2022-02-01T00:00:00.002Z

See details.


Timestamp difference

Supported in: Batch, Faster, Streaming

Returns the difference between two timestamps in the given time unit.

Expression categories: Datetime

Output type: Long

Example

Argument values:

  • End: 2022-10-01T10:00:00Z
  • Start: 2022-10-01T09:00:00Z
  • Unit: HOURS

Output: 1

See details.


Timestamp sequence

Supported in: Batch, Faster

Creates an array with timestamps in range from start to end.

Expression categories: Datetime

Output type: Array\

Example

Argument values:

  • End time: end_time
  • Start time: start_time
  • Step unit: DAYS
  • Step size: 1.0
start_time end_time Output
2023-01-01T00:00:00Z 2023-01-03T00:00:00Z [ 2023-01-01T00:00:00Z, 2023-01-02T00:00:00Z, 2023-01-03T00:00:00Z ]
2023-01-01T01:50:00Z 2023-01-03T00:00:00Z [ 2023-01-01T01:50:00Z, 2023-01-02T01:50:00Z ]

See details.


Timestamp subtract

Supported in: Batch, Faster, Streaming

Subtract value from timestamp in specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Timestamp: 2022-02-02T00:00:00Z
  • Unit: MILLISECONDS
  • Value to subtract: 2

Output: 2022-02-01T23:59:59.998Z

See details.


Timestamp to epoch millis

Supported in: Batch, Faster, Streaming

Converts from timestamp in UTC to epoch milliseconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:00:00Z

Output: 1664614800000

See details.


Timestamp to epoch seconds

Supported in: Batch, Faster, Streaming

Converts from timestamp in UTC to epoch seconds.

Expression categories: Cast, Datetime

Output type: Long

Example

Argument values:

  • Timestamp: 2022-10-01T09:01:13.47Z

Output: 1664614873

See details.


Title case

Supported in: Batch, Faster, Streaming

Converts the first character of each word to be uppercase and the rest lowercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: Hello World

See details.


Token set ratio

Supported in: Batch, Streaming

Compute the token set ratio between two strings. Token set ratio is a metric describing how similar two strings are, and will return a value between 0 and 1, where 0 means that there are no similarities between the two strings and 1 means that they are the same (or one is a substring of the other).

Expression categories: Distance measurement, String

Output type: Double

Example

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
left right Output
hello world world hello 1.0
Hello hello world 0.5
hello hello WorlD hello world 0.8181818181818181
hello farewell 0.46153846153846156
empty string empty string 1.0

See details.


Transcribe audio into JSON using CPU

Supported in: Batch

Transcribe audio files into JSON using CPU.

Expression categories: Media

Output type: String

See details.


Transcribe audio into JSON using GPU

Supported in: Batch

Transcribe audio files into JSON using GPU.

Expression categories: Media

Output type: String

See details.


Transcribe audio into text

Supported in: Batch, Faster

Transcribes an audio file into text.

Expression categories: Media

Output type: String | Struct\

See details.


Transform array element

Supported in: Batch, Streaming

Maps each element of an array using an expression. Note, array index starts at 1.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Array: flight_number
  • Expression to apply:
    stringBeforeDelimiter(
     delimiter: -,
     expression: element,
     ignoreCase: false,
    )
flight_number Output
[ XB-134, MT-111 ] [ XB, MT ]

See details.


Transform map keys

Supported in: Batch, Streaming

Transforms keys of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Expression to apply:
    stringBeforeDelimiter(
     delimiter: -,
     expression: key,
     ignoreCase: false,
    )
  • Map: flight_number
flight_number Output
{
 MT-111 -> 2,
 XB-134 -> 1,
}
{
 MT -> 2,
 XB -> 1,
}

See details.


Transform map values

Supported in: Batch

Transforms values of a map by applying an expression to every key-value pair.

Expression categories: Map

Type variable bounds: K accepts AnyType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Expression to apply:
    stringBeforeDelimiter(
     delimiter: -,
     expression: value,
     ignoreCase: false,
    )
  • Map: flight_number
flight_number Output
{
 1 -> XB-134,
 2 -> MT-111,
}
{
 1 -> XB,
 2 -> MT,
}

See details.


Trim whitespace

Supported in: Batch, Faster, Streaming

Trims whitespace at beginning and end of string. Whitespace is defined as characters in any of: 1) Unicode's \p{whitespace} set, 2) Java's String#trim() method, or 3) Java's Character#isWhitespace() method.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: hello world

Output: hello world

See details.


Truncate date

Supported in: Batch, Faster

Returns the date rounded down to the nearest day/week/month/quarter/year.

Expression categories: Datetime

Output type: Date

See details.


Truncate timestamp

Supported in: Batch, Faster

Returns the timestamp truncated to the specified unit.

Expression categories: Datetime

Output type: Timestamp

Example

Argument values:

  • Start: 2022-02-01T10:10:10.0022Z
  • Unit: MILLISECONDS

Output: 2022-02-01T10:10:10.002Z

See details.


UUID V5

Supported in: Batch, Streaming

Generates a deterministic UUID v5 from a namespace UUID and a name string using SHA-1 hashing (RFC 4122). The same namespace and name will always produce the same UUID. Returns null if the namespace is not a valid UUID.

Expression categories: String

Output type: String

Example

Description: Generate a deterministic UUID v5 from a namespace UUID and name string.

Argument values:

  • Name: name
  • Namespace UUID: namespace
namespace name Output
6ba7b810-9dad-11d1-80b4-00c04fd430c8 hello 9342d47a-1bab-5709-9869-c840b2eac501
6ba7b811-9dad-11d1-80b4-00c04fd430c8 https://example.com 4fd35a71-71ef-5a55-a9d9-aa75c889a6d0

See details.


Uncompact a set of H3 indices

Supported in: Batch, Faster, Streaming

Uncompact H3 indices to the specified resolution. All input indices must be at a resolution less than or equal to the requested resolution or this transform will return null. If any of the input indices are invalid this transform will return null. Output indices are sorted in ascending order.

Expression categories: Geospatial

Output type: Array\

See details.


Unicode normalize

Supported in: Batch, Faster, Streaming

Perform unicode normalization as per Unicode Standard Annex #15.

Expression categories: Data preparation, String

Output type: String

Example

Argument values:

  • Expression: string
  • Normalization form: nfkc
string Output
123 123
イナゴ イナゴ

See details.


Uniform random number

Supported in: Batch, Faster, Streaming

Returns a column of uniform random numbers drawn between 0 and 1. This is not deterministic and will not produce the same result on repeated builds, even when using a seed.

Expression categories: Numeric

Output type: Double

See details.


Universally unique identifier (uuid) (unstable)

Supported in: Batch, Faster, Streaming

Returns a column of UUID. This is not deterministic and will not produce the same result on repeated builds. This is not the preferred way to build an id column and users should look into SHA-256 or others that are deterministic, for example UUID v5.

Expression categories: String

Output type: String

See details.


Uppercase

Supported in: Batch, Faster, Streaming

Converts all characters in string to uppercase.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: hello World

Output: HELLO WORLD

See details.


Url decode

Supported in: Batch, Faster, Streaming

Decodes a percent-encoded string to plain text.

Expression categories: Cast, String

Output type: String

Example

Argument values:

  • Expression: string
string Output
raw_string_with_no_special_characters raw_string_with_no_special_characters
test%2Fapi%3Fstring%3D3 test/api?string=3

See details.


Url encode

Supported in: Batch, Faster, Streaming

Percent-encodes a string to be sent in a url.

Expression categories: String

Output type: String

Example

Argument values:

  • Expression: string
string Output
raw_string_with_no_special_characters raw_string_with_no_special_characters
test/api?string=3 test%2Fapi%3Fstring%3D3

See details.


Use LLM

Supported in: Batch, Faster

Call an LLM with a configurable prompt.

Expression categories: String

Output type: Array\ | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Struct\ | Boolean | Date | Decimal | Double | Float | Integer | Long | Short | String | Struct | Timestamp, error:String> | Timestamp

Example

Argument values:

  • Model:
    gpt4ChatModel(
     temperature: 0.0,
    )
  • Prompt: prompt
  • System prompt: [In the context of a food delivery app, your job is to rate reviews given in the following user promp...]
  • Output mode: null
  • Output type: null
prompt Output
The food was great! 5

See details.


Value from map

Supported in: Batch, Faster, Streaming

Get a value from a map using a key.

Expression categories: Map

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: V

Example

Argument values:

  • Key: Foo
  • Map: {
     Bar -> World,
     Foo -> Hello,
    }

Output: Hello

See details.


Aggregate expressions


All of

Supported in: Batch, Faster

Calculate the boolean 'and' of an aggregate, using SQL standard semantics for null values.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: false

See details.


Any of

Supported in: Batch, Faster

Calculate the boolean 'or' of an aggregate. Nulls are considered false.

Expression categories: Aggregate

Output type: Boolean

Example

Argument values:

  • Expression: values

Given input table:

values
true
false
true

Outputs: true

See details.


Approximate median

Supported in: Batch

Computes approximate median of values in the column.

Expression categories: Aggregate

Output type: Numeric

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


Approximate percentile

Supported in: Batch

Returns the approximate percentile of the expression which is the smallest value in the ordered expression values (sorted from least to greatest) such that no more than percentage of expression values is less than the value or equal to that value.

Expression categories: Aggregate

Output type: Array\ | Byte | Decimal | Double | Float | Integer | Long | Short

Example

Argument values:

  • Expression: values
  • Percentiles: [0.5]
  • Accuracy: null

Given input table:

values
2
4
3

Outputs: 3

See details.


Collect array

Supported in: Batch, Faster, Streaming

Collects an array of values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: Array\

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 2, 3 ]

See details.


Collect distinct array

Supported in: Batch, Faster, Streaming

Collects an array of deduplicated values within each group. Null values are ignored.

Expression categories: Aggregate

Type variable bounds: T accepts ComparableType

Output type: Array\

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
2
3

Outputs: [ 2, 3 ]

See details.


Covariance

Supported in: Batch, Streaming

Calculate the population covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

left right
1 5
2 4
3 3
4 2
5 1

Outputs: -2.0

See details.


Create simple geometries from ordered rows of GeoPoints

Supported in: Batch

Given a column of GeoPoints and an ordering, return either a polygon or a line string by connecting the GeoPoints in the specified order. This function assumes that the data is tabular, with a single row representing an individual GeoPoint in a line string or in the shell of a polygon, along with a column specifying the order of those points. For a polygon this ordering should identify the points as you move counter-clockwise around the shell. Given an ordering of these points and a partition (grouping), the function constructs the required geometry for that partition by joining the GeoPoints in ascending order of the order-by column.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • GeoPoint: geo_point
  • Order by (ascending): order
  • Output geometry type: LINE_STRING

Given input table:

geo_point order
{
 latitude -> 0.0,
 longitude -> 0.0,
}
0
{
 latitude -> 1.0,
 longitude -> 0.0,
}
1
{
 latitude -> 1.0,
 longitude -> 1.0,
}
2

Outputs: {"type":"LineString","coordinates": [[0.0,0.0],[0.0, 1.0],[1.0,1.0]]}

See details.


Dense rank

Supported in: Batch, Faster

Returns the rank of rows within a window partition, without any gaps. In case of ties the rows get same rank. The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See details.


Distinct count

Supported in: Batch, Faster, Streaming

Calculate distinct number of values in column.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


First

Supported in: Batch, Faster, Streaming

First item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
null
2
4
3

Outputs: null

See details.


Grouped geometry envelope

Supported in: Batch, Faster

Returns the envelope of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {"type":"Polygon","coordinates":[[[-6.0,-92.3],[-6.0,8.4],[125.6,8.4],[125.6,-92.3],[-6.0,-92.3]]]}

See details.


Grouped geometry union

Supported in: Batch

Combines the grouped geometries to create a single geometry.

Expression categories: Geospatial

Output type: Geometry

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]}
{"type":"Polygon","coordinates":[[[0.5,0.0],[1.5,0.0],[1.5,1.0],[0.5,1.0],[0.5,0.0]]]}

Outputs: {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[0.5,1.0],[1.0,1.0],[1.5,1.0],[1.5,0.0],[1.0,0.0],[0.5,0.0],[0.0,0.0]]]}

See details.


Grouped latitude/longitude bounding box

Supported in: Batch

Returns a struct containing the entire bounding box of all valid geometries in the given column. Invalid geometries are treated as null and ignored.

Expression categories: Geospatial

Output type: LatLonBoundingBox

Example

Argument values:

  • Expression: geometry

Given input table:

geometry
{"type":"LineString","coordinates":[[1,0],[0,8.4]]}
{"type":"Point","coordinates":[125.6, -92.3]}
{"type":"Polygon","coordinates":[[[0,0],[1,6.3],[-6,1],[0,0]]]}

Outputs: {
 maxLat -> 8.4,
 maxLon -> 125.6,
 minLat -> -92.3,
 minLon -> -6.0,
}

See details.


Lag

Supported in: Batch, Faster

Returns the value of the input at 'lag' before the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See details.


Last

Supported in: Batch, Faster, Streaming

Last item in the group. Note, if used within an aggregate or unordered window, the row selected will be non-deterministic.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

Example

Argument values:

  • Expression: values
  • Ignore nulls: false

Given input table:

values
2
4
3
null

Outputs: null

See details.


Lead

Supported in: Batch, Faster

Returns the value of the input at 'lead' after the current row in the window.

Expression categories: Aggregate

Type variable bounds: T accepts AnyType

Output type: T

See details.


Linear regression gradient

Supported in: Batch

Returns the slope of the linear regression line for non-null pairs in a group. Returns null if there are insufficient non-null pairs or if the variance of the independent variable is zero.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

left right
1 5
2 4
3 3
4 2
5 1

Outputs: -1.0

See details.


Max

Supported in: Batch, Faster, Streaming

Calculate maximum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 4

See details.


Max by

Supported in: Streaming

This expression computes a max row according to the max column expression after applying the provided filter specification. If there is no maximum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    lessThan(
     left: salary,
     right: 5000,
    )

Given input table:

dep_name salary
develop 9900
develop 4000
develop 3000

Outputs: 4000

See details.


Mean

Supported in: Batch, Faster, Streaming

Calculate mean of values in column.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3.0

See details.


Median

Supported in: Batch, Faster

Calculate median of values in column.

Expression categories: Numeric

Output type: Decimal | Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3.0

See details.


Min

Supported in: Batch, Faster, Streaming

Calculate minimum value in column.

Expression categories: Numeric

Output type: ComparableType

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 2

See details.


Min by

Supported in: Streaming

This expression computes a min row according to the min column expression after applying the provided filter specification. If there is no minimum row, null will be returned.

Expression categories: Aggregate

Output type: AnyType

Example

Argument values:

  • Expression: salary
  • Output projection expression: salary
  • Filter condition:
    greaterThan(
     left: salary,
     right: 0,
    )

Given input table:

dep_name salary
develop -999
develop 4000
develop 3000

Outputs: 3000

See details.


Mode

Supported in: Batch, Faster

Calculate mode of values in column.

Expression categories: Aggregate

Type variable bounds: Any accepts Binary | Boolean | Byte | Date | Decimal | Double | Float | Integer | Long | Short | String | Timestamp

Output type: Any

Example

Argument values:

  • Expression: values

Given input table:

values
a
b
b
b
c
c
d

Outputs: b

See details.


Percent rank

Supported in: Batch, Faster

Returns the percentile of rows within a window partition. A draw is assigned the same percent.

Expression categories: Aggregate

Output type: Double

See details.


Pivot

Supported in: Streaming

Apply an aggregate expression in a pivot context. The aggregation will run as a set of separate aggregations scoped to each distinct value of the pivot expression. The output is a map from pivot value to aggregate expression value.

Expression categories: Aggregate

Type variable bounds: K accepts ComparableType**V accepts AnyType

Output type: Map\

Example

Argument values:

  • Aggregate expression:
    sum(
     expression: value,
    )
  • Pivot expression: pivot

Given input table:

pivot value
a 1
b 2
a 3

Outputs: {
 a -> 4,
 b -> 2,
}

See details.


Product

Supported in: Batch

Calculates the product of all input columns.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: factor

Given input table:

factor
2
4
3

Outputs: 24.0

See details.


Rank

Supported in: Batch, Faster

Returns the rank of rows within a window partition. In case of ties the rows get same rank. The difference between rank and dense_rank is that rank leaves gaps in ranking sequence when there are ties.

Expression categories: Aggregate

Output type: Integer

See details.


Row count

Supported in: Batch, Faster, Streaming

Counts the number of non null rows in a group.

Expression categories: Aggregate

Output type: Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 3

See details.


Row number

Supported in: Batch, Faster, Streaming

Returns a sequential number starting at 1 inside each partition.

Expression categories: Aggregate

Output type: Integer

See details.


Sample covariance

Supported in: Batch, Streaming

Calculate the sample covariance of values in two columns.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Left: left
  • Right: right

Given input table:

left right
1 5
2 4
3 3
4 2
5 1

Outputs: -2.5

See details.


Sample variance

Supported in: Batch, Streaming

Calculate the sample variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
2
3

Outputs: 0.33333333333

See details.


Standard deviation

Supported in: Batch, Faster

Calculate standard deviation of the values in column.

Expression categories: Numeric

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.81649658092773

See details.


Sum

Supported in: Batch, Faster, Streaming

Sums the specified expression.

Expression categories: Numeric

Output type: Decimal | Double | Long

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 9

See details.


Variance

Supported in: Batch, Streaming

Calculate population variance of values in column.

Expression categories: Aggregate

Output type: Double

Example

Argument values:

  • Expression: values

Given input table:

values
2
4
3

Outputs: 0.66666666667

See details.


Generator expressions


Explode array

Supported in: Batch, Faster, Streaming

Explode array into a row per value.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: T

See details.


Explode array with position

Supported in: Batch, Streaming

Explode array into a row per value as a struct containing the element's relative position in the array and the element itself.

Expression categories: Array

Type variable bounds: T accepts AnyType

Output type: Struct\

Example

Argument values:

  • Array: array
  • Keep empty / null arrays: null

Given input table:

array
[ one, two, three ]
[ four, five ]

Expected output table: | array | | ----- | | {
 element -> one,
 position -> 1,
} | | {
 element -> two,
 position -> 2,
} | | {
 element -> three,
 position -> 3,
} | | {
 element -> four,
 position -> 1,
} | | {
 element -> five,
 position -> 2,
} |

See details.


Explode map

Supported in: Batch, Streaming

Explode map into a row per key, value pair.

Expression categories: Map

Type variable bounds: TKey accepts AnyType**TValue accepts AnyType

Output type: Struct\

Example

Argument values:

  • Expression: map

Given input table:

map
{
 1 -> val1,
 2 -> val2,
}
{
 3 -> val3,
 4 -> val4,
}

Expected output table: | map | | ----- | | {
 key -> 1,
 value -> val1,
} | | {
 key -> 2,
 value -> val2,
} | | {
 key -> 3,
 value -> val3,
} | | {
 key -> 4,
 value -> val4,
} |

See details.


Transforms


Aggregate

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

Example

Argument values:

  • Aggregations: [
    alias(
     alias: factor,
     expression:
    sum(
     expression: factor,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.aggregate
  • Group by columns: [tail_number]

Input:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

Output:

tail_number factor
XB-123 10
MT-222 9
KK-452 1

See details.


Aggregate on condition

Supported in: Batch, Faster

Aggregate expressions based on a condition statement.

Transform categories: Aggregate, Popular

See details.


Aggregate over window

Supported in: Streaming

Performs the specified aggregations on the data within a window, emitting outputs as specified by the provided trigger.

Transform categories: Aggregate

See details.


Anti join

Supported in: Batch, Faster

Anti joins left and right dataset inputs, removing all rows from the left relation that match the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline
PA-452 new air

See details.


Apply expression

Supported in: Batch, Faster, Streaming

Transforms input dataset by applying a single expression.

Transform categories: Popular

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression:
    alias(
     alias: kilometers,
     expression:
    convertDistance(
     amount: miles,
     currentUnit: mile,
     targetUnit: kilometer,
    ),
    )

Input:

airline miles
foundry airways 2500
new air 3000

Output:

kilometers airline miles
4023.36 foundry airways 2500
4828.03 new air 3000

See details.


Apply multiple expressions

Supported in: Batch, Faster, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

Example

Argument values:

  • Columns: [
    alias(
     alias: airline,
     expression: airline,
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Keep remaining columns: false

Input:

airline miles
foundry airways 2500
new air 3000

Output:

airline
foundry airways
new air

See details.


Apply to multiple columns

Supported in: Batch, Faster, Streaming

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Popular

See details.


Array elements to columns

Supported in: Batch, Faster

Extracts elements from an array into columns.

Transform categories: Array

Example

Argument values:

  • Array: stats
  • Columns to extract: [miles, id]
  • Dataset: ri.foundry.main.dataset.a

Input:

stats
[ 1000, 2 ]

Output:

miles id stats
1000 2 [ 1000, 2 ]

See details.


Assign timestamps and watermarks

Supported in: Streaming

Assigns timestamps and watermarks to the input, filtering out records where the timestamp is null.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Timestamp expression: timestamp
  • Emit watermark on every record: null
  • Idleness timeout unit: null
  • Idleness timeout value: null

Input:

timestamp temperature sensor_id
1969-12-31T23:59:50Z 28 sensor_1
1969-12-31T23:59:40Z 30 sensor_2
1969-12-31T23:59:35Z 29 sensor_1

Output:

timestamp temperature sensor_id
1969-12-31T23:59:50Z 28 sensor_1
1969-12-31T23:59:40Z 30 sensor_2
1969-12-31T23:59:35Z 29 sensor_1

See details.


Coalesce data

Supported in: Batch, Faster

Operation to reduce the number of partitions. If you have 1000 partitions and you coalesce to 100 there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. If a larger number of partitions is requested, it will stay at the current number of partitions.

Transform categories: Other

See details.


Compute if expression absent

Supported in: Batch

Computes the expression for new rows, the value for a given key will only ever be computed once, even across builds.

Transform categories: Other

See details.


Convert media set to table rows

Supported in: Batch

Produces a dataset containing media references and basic metadata for media items in a media set.

Transform categories: File, Media

See details.


Cross join

Supported in: Batch, Faster

Cross joins left and right dataset inputs together, matching all rows from each side against all rows from the other. The output is the cartesian product of the two datasets.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
PA-452 new air 212 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
XB-123 foundry air CPH
XB-123 foundry air JFK
XB-123 foundry air IAD
MT-222 new airline LHR
MT-222 new airline CPH
MT-222 new airline JFK
MT-222 new airline IAD
PA-452 new air LHR
PA-452 new air CPH
PA-452 new air JFK
PA-452 new air IAD

See details.


Date distribution

Supported in: Batch

Computes the distribution of dates/timestamps in a specified column.

Transform categories: Datetime

See details.


Decompress gzip files

Supported in: Batch

Decompress each file in a dataset of gzipped files. Note that users must have editor permission to be able to preview the unarchive file transform and all downstream nodes.

Transform categories: File

See details.


Decompress tar files

Supported in: Batch

Decompress each file in a dataset of tar files. Note that users must have editor permission to be able to preview the untared file transform and all downstream nodes.

Transform categories: File

See details.


Drop columns

Supported in: Batch, Faster, Streaming

Transforms input dataset by dropping the specified columns.

Transform categories: Popular

Example

Argument values:

  • Columns to drop: {miles}
  • Dataset: ri.foundry.main.dataset.a

Input:

airline miles airports
foundry airways 3000 [ JFK, SFO ]

Output:

airline airports
foundry airways [ JFK, SFO ]

See details.


Drop duplicates

Supported in: Batch, Faster

Drops duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.aggregate
  • Column subset: {tail_number}

Input:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

Output:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
KK-452 new air 222 1

See details.


Empty file

Supported in: Batch

Creates an empty file.

Transform categories: Other

See details.


Empty media set file

Supported in: Batch, Streaming

Creates an empty media set file with the given schema and snapshot read mode.

Transform categories: Other

See details.


Empty table

Supported in: Batch, Faster, Streaming

Creates an empty table with the given schema and read mode.

Transform categories: Other

Example

Argument values:

  • Schema: Struct\
  • Read mode: null

Inputs:

Output:

flight_code flight_number airline

See details.


Extract file metadata from dataset as rows

Supported in: Batch

Reads file metadata as rows from a dataset of files.

Transform categories: File

See details.


Extract many struct fields

Supported in: Batch, Faster

Extracts many fields from a struct. Original struct will be dropped.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Locators: [(airline.name, airline), (tail_no, tail_number)]
  • Struct: raw

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

airline tail_number
new air NA-123
foundry airways FA-123

See details.


Extract rows from a CSV file

Supported in: Batch

Reads a dataset of files and parses each CSV file into rows.

Transform categories: File

See details.


Extract rows from a GeoJSON file

Supported in: Batch

Reads a dataset of files and parses each GeoJSON file into rows. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. All GeoJSONs in the files must either be: a) multiline FeatureCollection: an entire file with one GeoJSON of type FeatureCollection b) single-line Feature: a file where every line is a fully valid GeoJSON of type Feature.

Transform categories: File, Geospatial

See details.


Extract rows from a JSON file

Supported in: Batch

Reads a dataset of files and parses each JSON file into rows.

Transform categories: File, String, Struct

See details.


Extract rows from a dataset of email files

Supported in: Batch

Reads a dataset of email files and parses each file into a row. Supported file extensions: .eml, .emltpl, and .msg.

Transform categories: File, Media

See details.


Extract rows from a dataset of text files

Supported in: Batch

Reads a dataset of text files and parses each file into a row.

Transform categories: File, String

See details.


Extract rows from an Excel file

Supported in: Batch

Reads a dataset of Microsoft Excel files and parses each file into rows. Supported file formats: .xls, .xlt, .xltm, .xltx, .xlsx, .xlsm.

The processing of individual Excel files is not distributed across multiple Spark executors, so we recommend enabling the usage of local Spark in build settings if the input dataset is expected to have exactly one file.

Particularly large Excel files can require a lot of memory to process, so if you observe builds failing with out-of-memory errors, consider using custom build settings with increased executor memory (or increased driver memory in the case of local Spark). For such large files, it may not be possible to preview the output, but deployment can still succeed given appropriate build settings.

Transform categories: File

See details.


Extract rows from an XML file

Supported in: Batch

Reads a dataset of files and parses each XML file into rows.

Transform categories: File

See details.


Extract rows from shapefile

Supported in: Batch

Reads a dataset of files and parses each shapefile into rows. All files except .shp, .shx and .dbf files will be ignored. This shapefile parser only supports point, polyline, polygon and multipoint geometry types. The output dataset will have a geometry column, and a column for each property listed by the user, apart from the _error and _file columns. If the user provides no properties to extract, the entire properties struct will be extracted into a properties column as a string. UTF-8 is the only supported encoding for property names and values (even if a .cpg file that specifies an alternative coding exists, it will be ignored).

Transform categories: File, Geospatial

See details.


Filter

Supported in: Batch, Faster, Streaming

Filters the input dataset based on the specified filter condition.

Transform categories: Data preparation, Popular

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Filter condition: recently_serviced

Input:

recently_serviced tail_number
true KK-150
false XB-120
true MT-190

Output:

recently_serviced tail_number
true KK-150
true MT-190

See details.


Filter files

Supported in: Batch

Filters a dataset of files.

Transform categories: File

See details.


First union by name

Supported in: Batch, Faster

Unions a set of datasets together on columns from the first dataset, adding nulls when columns are missing. Columns that are not present in the first dataset are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs:

ri.foundry.main.dataset.a

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT

ri.foundry.main.dataset.b

recently_serviced tail_number home_country
true AA-200 US
true BN-435 UK
true BN-111 UK

Output:

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT
true AA-200 null
true BN-435 null
true BN-111 null

See details.


Flatten struct

Supported in: Batch, Faster, Streaming

Take all fields in a struct and turn them into columns in the output dataset.

Transform categories: Struct

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Expression: raw
  • Max depth: 2
  • Column prefix: new_
  • Separator: null

Input:

raw
{
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
{
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

Output:

new_airline_name new_airline_id new_tail_no raw
new air NA NA-123 {
airline: {
id: NA,
name: new air,
},
tail_no: NA-123,
}
foundry airways FA FA-123 {
airline: {
id: FA,
name: foundry airways,
},
tail_no: FA-123,
}

See details.


Frequent pattern growth

Supported in: Batch

Frequent pattern (fp) growth finds frequent patterns in your dataset.

Transform categories: Aggregate, Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Items column: customer_attributes
  • Minimum support: 0.6

Input:

customer_attributes
[ age_group: 20-30, country: Germany, gender: Female ]
[ age_group: 20-30, country: Germany, gender: Male ]

Output:

pattern pattern_occurrence total_count
[ country: Germany, age_group: 20-30 ] 2 2
[ age_group: 20-30 ] 2 2
[ country: Germany ] 2 2

See details.


Geo distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between input geometries. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 10.0
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [[[21.0, 21.0], [27.0, 21.0], [27.0, 27.0], [21.0, 27.0], [21.0, 21.0]]], "type": "Polygon"} rhsVal2 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} [ 0.0, 1.0 ]
{"coordinates": [[25.0, 0.0], [0.0, 25.0]], "type":"LineString"} 44.0 {"coordinates": [[[20.0, 10.0], [27.0, 10.0], [27.0, 17.0], [20.0, 17.0], [20.0, 10.0]]], "type": "Polygon"} [ 0.0, 1.0 ]

See details.


Geo distance left join

Supported in: Batch

Left joins datasets together if the distance between input geometries is less than or equal to the specified distance. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryColRhs, rhs-1],
    )
  • Distance: 1640.42
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: epsg:2868
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0
null 43.0

ri.foundry.main.dataset.right

geometryColRhs rhs-1
{"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1
{"coordinates": [-112.11796760559083,33.440895931474124], "type":"Point"} rhsVal2

Output:

geometryColLhs lhs-1 geometryColRhs rhs-1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {"coordinates": [-112.14560508728029,33.44082430962016], "type":"Point"} rhsVal1
null 43.0 null null

See details.


Geo intersection inner join

Supported in: Batch, Streaming

Inner joins left and right datasets together based on whether input geometries overlap. Includes just touching geometries in the results.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

geometryColLhs col1Lhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0

ri.foundry.main.dataset.right

geometryColRhs col1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"} rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal10

Output:

geometryColLhs col1Lhs geometryColRhs col1Rhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9

See details.


Geo intersection left anti join

Supported in: Streaming

Anti joins input datasets based on whether input geometries overlap. Returns only rows from the left dataset where the geometry does not intersect with any geometry in the right dataset. Rows with null or invalid join keys are considered non-intersecting.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

geometryColLhs col1Lhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryColRhs col1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"} rhsVal4

Output:

geometryColLhs col1Lhs
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

See details.


Geo intersection left join

Supported in: Batch, Streaming

Left joins input datasets based on whether input geometries overlap. Includes just touching geometries in the results. Null or invalid geometries will not return matches.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Condition for columns to select on the right:
    allColumns(

    )
  • Join key: (geometryColLhs, geometryColRhs)
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

geometryColLhs col1Lhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0

ri.foundry.main.dataset.right

geometryColRhs col1Rhs
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"} rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal10

Output:

geometryColLhs col1Lhs geometryColRhs col1Rhs
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9
{"coordinates": [55.0, 5.0], "type":"Point"} 43.0 null null

See details.


GeoPoint-to-GeoPoint 3d distance inner join

Supported in: Batch

Inner joins left and right datasets together based on the distance between point geometries. The geometries must represent points, and may optionally include a z-coordinate. Internally converts geometries into the given projected coordinate reference system prior to the join and back to WGS84. Non-point geometries are ignored, and the entire right dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 4 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryColLhs, lhs-1],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, arrayCol],
    )
  • Distance: 2.5
  • Join key: (geometryColLhs, geometryCol)
  • Left dataset: ri.foundry.main.dataset.left
  • Projected coordinate system: EPSG:4326
  • Right dataset: ri.foundry.main.dataset.right
  • Use z-coordinate: false
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0

ri.foundry.main.dataset.right

geometryCol col1 arrayCol
{"coordinates": [0.0, 0.0, 2.0], "type":"Point"} rhsVal1 [ 0.0, 1.0 ]
{"coordinates": [0.0, 1.0], "type":"Point"} rhsVal2 [ 0.0, 1.0 ]

Output:

geometryColLhs lhs-1 rhs_geometryCol rhs_arrayCol
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 0.0], "type":"Point"} 42.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0, 5.0], "type":"Point"} 43.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 0.0, 2.0], "type":"Point"} [ 0.0, 1.0 ]
{"coordinates": [0.0, 0.0], "type":"Point"} 44.0 {"coordinates": [0.0, 1.0], "type":"Point"} [ 0.0, 1.0 ]

See details.


Geometry intersection join

Supported in: Batch

Inner joins left and right datasets together based on whether input geometries overlap. Returns a row containing all of the columns from both datasets if the join key column pair has geometries which intersect. Currently does not support joining on multiple join keys. Silently filters null join key geometry values. Left and right datasets must not have the same column names. Silently nullifies invalid GeoJSON in join columns.

Transform categories: Geospatial, Join

Example

Argument values:

  • Join key: [(geometryColLhs, geometryColRhs)]
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs:

ri.foundry.main.dataset.left

geometryColLhs lhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0

ri.foundry.main.dataset.right

geometryColRhs rhs-1
{"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"coordinates": [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]], "type": "Polygon"} rhsVal2
{"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"coordinates": [15.0, 15.0], "type":"Point"} rhsVal4
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal6
{"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"coordinates": [[20.0, 20.0], [21.0, 23.0]], "type":"LineString"} rhsVal8
{"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9
{"coordinates": [[[[170.0, 170.0], [190.0, 170.0], [190.0, 190.0], [170.0, 190.0], [170.0, 170.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal10

Output:

geometryColLhs lhs-1 geometryColRhs rhs-1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], "type": "Polygon"} rhsVal1
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [0.0, 0.0], "type":"Point"} rhsVal3
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal5
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[-1.0, -1.0], [5.0, 5.0]], "type":"LineString"} rhsVal7
{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,10.0],[10.0,10.0],[10.0,0.0],[0.0,0.0]]]} 42.0 {"coordinates": [[[[2.0, 2.0], [7.0, 2.0], [7.0, 7.0], [2.0, 7.0], [2.0, 2.0]]], [[[12.0, 12.0], [17.0, 12.0], [17.0, 17.0], [12.0, 17.0], [12.0, 12.0]]]], "type":"MultiPolygon"} rhsVal9

See details.


Geometry knn inner join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryCol lhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0

ri.foundry.main.dataset.right

geometryCol col
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCol lhsCol rhs_geometryCol rhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See details.


Geometry knn left join

Supported in: Batch

Selects the k closest points from the neighbors dataset for each valid input geometry from the base dataset. Internally converts the input datasets to the given coordinate reference system, and back to WGS84. The entire neighbors dataset must be able to fit into driver and executor memory. A 3 gb executor should be able to handle up to 1 million points in the neighbors dataset.

Transform categories: Geospatial, Join

Example

Argument values:

  • Base dataset: ri.foundry.main.dataset.left
  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [geometryCol, lhsCol],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [geometryCol, col],
    )
  • Join key: (geometryCol, geometryCol)
  • K: 2
  • Neighbors dataset: ri.foundry.main.dataset.right
  • Projected coordinate system: epsg:2868
  • Prefix for columns from right: rhs_

Inputs:

ri.foundry.main.dataset.left

geometryCol lhsCol
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0

ri.foundry.main.dataset.right

geometryCol col
{
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2
{
latitude: 33.440895931474124,
longitude: -112.11796760559083,
}
rhsVal3

Output:

geometryCol lhsCol rhs_geometryCol rhs_col
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {
latitude: 33.440609443703586,
longitude: -112.14843750000001,
}
rhsVal1
{"coordinates": [-112.14843750000001,33.440609443703586], "type":"Point"} 42.0 {
latitude: 33.44082430962016,
longitude: -112.14560508728029,
}
rhsVal2

See details.


Get media references (datasets)

Supported in: Batch

Produces a dataset containing media references and basic metadata for files in a dataset.

Transform categories: File

See details.


Heartbeat detection

Supported in: Streaming

Detects when a record hasn't been seen for a configurable amount of time for a set of keys.

Transform categories: Other

See details.


Inner join

Supported in: Batch, Faster

Joins two datasets together, keeping only rows that satisfy the provided condition from each table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
XB-123 foundry airline LHR

See details.


Join

Supported in: Batch, Faster, Streaming

Joins left and right dataset inputs together.

Transform categories: Join

See details.


K-means clustering

Supported in: Batch

K-means clustering is an unsupervised machine learning algorithm. It groups dataset vectors into k clusters. The k value is determined by computing the best silhouette score of the specified range between minimum k and maximum k. Number of k values defines how many k values should be tried within this range, inclusive of the boundaries.

Transform categories: Other

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Maximum k: 12
  • Minimum k: 3
  • Number of k values: 4
  • Vector column: feature_column

Input:

feature_column
[ 0.05, 3.1, 2.3 ]
[ 1.0, 3.1, 2.3 ]
[ 1.0, 3.5, 2.3 ]
[ 19.0, 12.3, -1.4 ]

Output:

feature_column cluster_id
[ 1.0, 3.1, 2.3 ] 0
[ 1.0, 3.5, 2.3 ] 0
[ 19.0, 12.3, -1.4 ] 1
[ 0.05, 3.1, 2.3 ] 2

See details.


KNN join

Supported in: Batch

Return the 'k' nearest rows from the right dataset for each row in the left dataset, based on the distance measure.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [fuzzy_airline, home_airport],
    )
  • Distance measure expression:
    alias(
     alias: distance,
     expression:
    levenshteinDistance(
     ignoreCase: true,
     left: airline,
     right: fuzzy_airline,
    ),
    )
  • K nearest: 2
  • Left dataset: ri.foundry.main.dataset.left
  • Rank column name: rank
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
PA-452 new air 212 2

ri.foundry.main.dataset.right

fuzzy_airline home_airport
air LHR
new airline CPH
new plane JFK
old air IAD

Output:

rank distance tail_number airline fuzzy_airline home_airport
1 3 PA-452 new air old air IAD
2 4 PA-452 new air air LHR
2 4 PA-452 new air new airline CPH
2 4 PA-452 new air new plane JFK
1 0 MT-222 new airline new airline CPH
2 4 MT-222 new airline new plane JFK
1 5 XB-123 foundry air old air IAD
2 8 XB-123 foundry air air LHR

See details.


Keeps duplicates

Supported in: Batch, Faster

Keep duplicate rows from the input.

Transform categories: Other

Example

Argument values:

  • Column subset: {tail_number}
  • Dataset: ri.foundry.main.dataset.aggregate

Input:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 3

Output:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
XB-123 foundry airline 1134 3

See details.


Key by

Supported in: Streaming

Keys the input by the provided key by columns. Note that this does not re-sort the data and only maintains per key ordering from the point the keys are set. Re-keying data may be unsafe in that if the newly keyed data was depending on any specific ordering then we can't guarantee that ordering if it wasn't already maintained by the previous keying. Additionally sets the primary key if cdc (change data capture) mode is enabled. Primary key defines columns that indicate which rows are updates, deletes, and the ordering of when read as a current view.

Transform categories: Other

See details.


Left join

Supported in: Batch, Faster

Joins two datasets together, keeping all rows from the left table and only rows which satisfy the provided condition from the right table.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR

See details.


Left lookup join

Supported in: Streaming

Joins two datasets together, keeping all rows from the left table and only matching rows from the right dataset.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition: [(tail_number, tail_number)]
  • Left dataset: ri.foundry.main.dataset.left
  • Max rows to join with a single row: 10
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR

See details.


Manually entered table

Supported in: Batch, Faster, Streaming

Uses manually entered table data to create an output.

Transform categories: Other

Example

Argument values:

  • Rows: [{
    airline: foundry airlines,
    flight_code: 112,
    flight_number: XB-123,
    }, {
    airline: foundry airlines,
    flight_code: 533,
    flight_number: MT-444,
    }, {
    airline: new air,
    flight_code: 934,
    flight_number: KK-123,
    }]
  • Schema: Struct\

Inputs:

Output:

flight_code flight_number airline
112 XB-123 foundry airlines
533 MT-444 foundry airlines
934 KK-123 new air

See details.


Mapping join

Supported in: Batch, Faster

Replaces values from the target columns in the source dataset with values in the mapping dataset.

Transform categories: Join

Type variable bounds: T1 accepts AnyType**T2 accepts AnyType

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.input
  • Key column for mapping values: flight_code
  • Mapping dataset: ri.foundry.main.dataset.mapping
  • Target columns: [flight_no, next_flight]
  • Values to use for mapping: flight_number
  • Assume unique mappings: null
  • Default value: unknown

Inputs:

ri.foundry.main.dataset.input

flight_no next_flight departure_time
533 112 2022-01-20T10:45:00Z
934 533 2022-01-20T11:20:00Z
222 934 2022-01-20T11:20:00Z

ri.foundry.main.dataset.mapping

flight_code flight_number airline
112 XB-123 foundry airlines
533 MT-444 foundry airlines
934 KK-123 new air

Output:

flight_no next_flight departure_time
MT-444 XB-123 2022-01-20T10:45:00Z
KK-123 MT-444 2022-01-20T11:20:00Z
unknown KK-123 2022-01-20T11:20:00Z

See details.


Narrow union by name

Supported in: Batch, Faster

Unions a set of datasets together on the intersection of their column names, columns that are not present in all input datasets are removed.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs:

ri.foundry.main.dataset.a

recently_serviced tail_number
true KK-150
false XB-120
true MT-190

ri.foundry.main.dataset.b

recently_serviced tail_number airline_code
true AA-200 AA
true BN-435 BN
true BN-111 BN

Output:

recently_serviced tail_number
true KK-150
false XB-120
true MT-190
true AA-200
true BN-435
true BN-111

See details.


New operator chain

Supported in: Streaming

Advanced flink feature, starts new operator chain here.

Transform categories: Other

See details.


Normalize column names

Supported in: Batch, Faster, Streaming

Normalizes column names to use lower_snake_case.

Transform categories: Data preparation

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Remove special characters: null

Input:

recentlyServiced tailNumber _airlineCode
true KK-150 KK
false XB-120 XB
true MT-190 MT

Output:

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT

See details.


Numeric distribution

Supported in: Batch, Faster

Computes the distribution of numeric values in a specified column.

Transform categories: Numeric

See details.


Outer caching join

Supported in: Streaming

Rows from the left & right inputs which meet all of the match conditions and are within the caching window, along with unmatched rows from both inputs.

Transform categories: Join

See details.


Outer caching join

Supported in: Streaming

Joins left and right dataset inputs together, caching the record with the highest event time from each side for use in subsequent joins. Processing time of a record is used as a tiebreaker. In the case of a time results are optimistically emitted if there's no value to join against.

Transform categories: Join

See details.


Outer join

Supported in: Batch, Faster

Outer joins the provided dataset inputs together, keeping all rows from both datasets. Columns have nulls when there is no row satisfying the provided condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    columnNameIsIn(
     columnNames: [tail_number, airline],
    )
  • Condition for columns to select on the right:
    columnNameIsIn(
     columnNames: [home_airport],
    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right
  • Prefix for columns from right: null

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline home_airport
XB-123 foundry air LHR
MT-222 new airline CPH
XB-123 foundry airline LHR
MT-222 new air CPH
KK-452 new air JFK
PA-452 new air null
XB-123 foundry airline LHR
JR-201 null IAD

See details.


Parse KML files into geometry lists

Supported in: Batch

Parses each raw KML file into a list of typed geometries.

Transform categories: File

See details.


Pivot

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset grouped by a set of columns. Unique values to pivot on must be provided such that the output schema is known ahead of runtime. This improves runtime stability over time.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts Boolean | Byte | Integer | Long | Short | String

Example

Argument values:

  • Aggregations: [
    alias(
     alias: miles,
     expression:
    mean(
     expression: miles,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.a
  • Group by columns: [airline]
  • Pivot by column: airport
  • Pivot by values: [(JFK, new_york), (LHR, london)]
  • Prefix or suffix alias: null

Input:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

Output:

airline new_york_miles london_miles
foundry airways 1002345.0 7381838.0
new air 1.22779055E7 null

See details.


Project over window

Supported in: Batch, Faster, Streaming

Performs the specified aggregations on the data within the window. Emits one row each time a new row is received.

Transform categories: Aggregate

See details.


Rename columns

Supported in: Batch, Faster, Streaming

Renames a set of columns.

Transform categories: Data preparation, Popular

Example

Argument values:

  • Input dataset: ri.foundry.main.dataset.a
  • Renames: [(recently_serviced, does_not_require_service)]

Input:

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT

Output:

does_not_require_service tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT

See details.


Repartition data

Supported in: Batch, Faster

Forces a shuffle of the data based on optionally provided partitioning columns and a resulting number of partitions. If these are not provided, the partitioning will be determined automatically.

Transform categories: Other

See details.


Rollup

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset at different levels of granularity, providing both intermediate and super aggregates.

Transform categories: Aggregate

Example

Argument values:

  • Aggregations: [
    alias(
     alias: mean_price,
     expression:
    mean(
     expression: price,
    ),
    )]
  • Dataset: ri.foundry.main.dataset.rollupBaseCase
  • Rollup columns: [city, model]

Input:

city model price store
London new phone 900.0 MegaMart
London new phone 850.75 AA
London new phone 870.75 ABC Zone
San Francisco new phone 1000.0 Prescos
San Francisco new phone 950.25 XZY Force
San Francisco new phone 1105.7 Phone Mart
London forestX 20 750.1 MegaMart
London forestX 20 690.0 AA
London forestX 20 730.0 ABC Zone
San Francisco forestX 20 890.4 Prescos
San Francisco forestX 20 900.1 XZY Force
San Francisco forestX 20 1050.75 Phone Mart

Output:

city model mean_price
London new phone 873.8333333333334
London forestX 20 723.3666666666667
London null 798.6
San Francisco new phone 1018.65
San Francisco forestX 20 947.0833333333334
San Francisco null 982.8666666666667
null null 890.7333333333335

See details.


Row size

Supported in: Batch

Estimates the size of a single row in the JVM.

Transform categories: Other

See details.


Select columns

Supported in: Batch, Faster, Streaming

Selects a set of columns from the input dataset.

Transform categories: Popular

See details.


Semi join

Supported in: Batch, Faster

Semi joins left and right dataset inputs together. This removes all rows that don't match the join condition.

Transform categories: Join

Example

Argument values:

  • Condition for columns to select on the left:
    allColumns(

    )
  • Join condition:
    equals(
     left: tail_number,
     right: tail_number,
    )
  • Left dataset: ri.foundry.main.dataset.left
  • Right dataset: ri.foundry.main.dataset.right

Inputs:

ri.foundry.main.dataset.left

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
PA-452 new air 212 2
XB-123 foundry airline 1134 2

ri.foundry.main.dataset.right

tail_number home_airport
XB-123 LHR
MT-222 CPH
KK-452 JFK
JR-201 IAD

Output:

tail_number airline miles factor
XB-123 foundry air 124 2
MT-222 new airline 1123 5
XB-123 foundry airline 335 5
MT-222 new air 565 4
KK-452 new air 222 1
XB-123 foundry airline 1134 2

See details.


Sort

Supported in: Batch, Faster

Transforms input dataset either by selecting columns or applying functions to columns.

Transform categories: Other

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Sort specification: [(b, DESCENDING)]

Input:

a b
1 2
3 4
5 6

Output:

a b
5 6
3 4
1 2

See details.


Split on condition

Supported in: Batch, Faster

Split an input into two outputs based on chosen condition.

Transform categories: Other

See details.


Text block

Supported in: Batch, Faster, Streaming

Insert a text description between your transformations. This does not transform the input data in any way.

Transform categories: Other

See details.


Time bounded drop duplicates

Supported in: Streaming

Drops duplicate rows from the input for given column subset, rows seen will expire after configured amount of event time. Row that arrive late by an amount greater than the configured amount of event time will always be dropped. Partitions by keys specified. Each drop duplicates will be computed separately for distinct key column values.

Transform categories: Other

See details.


Time bounded drop out of order

Supported in: Streaming

Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.

Transform categories: Other

See details.


Time bounded event time sort

Supported in: Streaming

Emits rows by key in ascending event time order, allowing for late arriving records up until at least the allowed lateness. Records arriving after the allowed lateness plus some small buffer interval will be dropped.

Transform categories: Other

See details.


Top rows

Supported in: Batch, Faster

Picks the top rows in each sorted partition.

Transform categories: Aggregate

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a
  • Partition by columns: {airline}
  • Sort specification: [(airport, DESCENDING), (miles, ASCENDING)]
  • Number of rows: null

Input:

airline airport miles
foundry airways JFK 1002345
foundry airways LHR 2221324
new air SFO 21356673
new air JFK 12323456
foundry airways LHR 12542352
new air JFK 12232355

Output:

airline airport miles
foundry airways LHR 2221324
new air SFO 21356673

See details.


Union by name

Supported in: Batch, Faster, Streaming

Unions a set of datasets together on matching column names.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs:

ri.foundry.main.dataset.a

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT

ri.foundry.main.dataset.b

recently_serviced tail_number airline_code
true AA-200 AA
true BN-435 BN
true BN-111 BN

Output:

recently_serviced tail_number airline_code
true KK-150 KK
false XB-120 XB
true MT-190 MT
true AA-200 AA
true BN-435 BN
true BN-111 BN

See details.


Union files

Supported in: Batch

Union datasets of files.

Transform categories: File

See details.


Unpivot

Supported in: Batch, Faster, Streaming

Unpivot is the opposite operation of pivot. This converts multiple columns into rows, transforming data from a wide format to a long format. To do so it creates two new columns: one containing the original column names as values, and another containing the corresponding data values. All other columns that are not unpivoted are kept as is.

Transform categories: Aggregate, Popular

Type variable bounds: T accepts AnyType

Example

Argument values:

  • Columns to unpivot: [new_york_miles, london_miles]
  • Dataset: ri.foundry.main.dataset.a
  • Name column: city
  • Value column: miles

Input:

airline new_york_miles london_miles
foundry airways 1000 6000
new air null 8000

Output:

city miles airline
new_york_miles 1000 foundry airways
london_miles 6000 foundry airways
new_york_miles null new air
london_miles 8000 new air

See details.


Unzip files

Supported in: Batch

Unzips each file in a dataset of zipped files. Any non-zip files are ignored. Note that users must have editor permission to be able to preview the unzip file transform and all downstream nodes.

Transform categories: File

See details.


Uppercase column names

Supported in: Batch, Faster, Streaming

Uppercases all column names in the dataset.

Transform categories: Data preparation

Example

Argument values:

  • Dataset: ri.foundry.main.dataset.a

Input:

recentlyServiced tailNumber airlineCode
true KK-150 KK
false XB-120 XB
true MT-190 MT

Output:

RECENTLYSERVICED TAILNUMBER AIRLINECODE
true KK-150 KK
false XB-120 XB
true MT-190 MT

See details.


Wide union by name

Supported in: Batch, Faster, Streaming

Unions a set of datasets together on the superset of their column names, adding nulls when columns are missing.

Transform categories: Join

Example

Argument values:

  • Datasets to union: [ri.foundry.main.dataset.a, ri.foundry.main.dataset.b]

Inputs:

ri.foundry.main.dataset.a

recently_serviced tail_number
true KK-150
false XB-120
true MT-190

ri.foundry.main.dataset.b

recently_serviced tail_number airline_code
true AA-200 AA
true BN-435 BN
true BN-111 BN

Output:

recently_serviced tail_number airline_code
true KK-150 null
false XB-120 null
true MT-190 null
true AA-200 AA
true BN-435 BN
true BN-111 BN

See details.


Window

Supported in: Batch, Faster

Performs the specified aggregations on the input dataset grouped by a set of columns.

Transform categories: Aggregate, Popular

See details.



中文翻译

函数索引

Pipeline Builder 提供了在不同层级上操作的表达式。它们通常可以分为行级表达式、聚合表达式或生成器表达式。

行级表达式(Row level functions)对单行中的值进行操作。大多数表达式都属于此类别,例如 add

聚合表达式(Aggregations)将多个行值聚合为一个。例如 'sum' 表达式。

生成器表达式(Generators)从单行生成多个值。例如 'explode_array' 表达式。

转换(Transforms)是对整个表或多个表进行操作的函数。例如 'drop' 转换。以下文档将概述可用的表达式和转换。

行级表达式


绝对值(Absolute value)

支持于:批处理(Batch)、快速处理(Faster)、流处理(Streaming)

返回绝对值。

表达式类别: 数值型(Numeric)

类型变量范围: T 接受数值型

输出类型: T

示例

参数值:

  • 表达式: numeric_column
numeric_column 输出
0.0 0.0
1.1 1.1
-1.1 1.1

查看详情


数字相加(Add numbers)

支持于:批处理、快速处理、流处理

计算所有输入列的总和。

表达式类别: 数值型

输出类型: 数值型

示例

参数值:

  • 表达式: [col_a, col_b]
col_a col_b 输出
0 1 1
3 -2 1

查看详情


添加或更新映射(Add or update map)

支持于:批处理、流处理

按键更新映射中的值或添加新的键值对。

表达式类别: 映射(Map)

类型变量范围: K 接受任意类型(AnyType)**V 接受任意类型

输出类型: Map\

示例

参数值:

  • 表达式: 4
  • 键: k
  • 映射: map_col
map_col 输出
{
 a -> 1,
 b -> 2,
 k -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}
{
 a -> 1,
 b -> 2,
}
{
 a -> 1,
 b -> 2,
 k -> 4,
}

查看详情


添加或更新结构体字段(Add or update struct field)

支持于:批处理、快速处理、流处理

更新结构体的字段或添加新字段。

表达式类别: 结构体(Struct)

输出类型: 结构体

示例

参数值:

  • 表达式: value
  • 定位器: airline.id
  • 结构体: struct
struct value 输出
{
airline: {
id: NA,
},
}
1 {
airline: {
id: 1,
},
}
{
airline: {
id: FE,
},
}
2 {
airline: {
id: 2,
},
}

查看详情


向日期添加值(Add value to date)

支持于:批处理、快速处理、流处理

返回在 'start' 之后 'value' 天/周/月/季度/年的日期。

表达式类别: 日期时间(Datetime)

输出类型: 日期(Date)

示例

参数值:

  • 日期: 2022-02-01
  • 单位: DAYS
  • 值: 2

输出: 2022-02-03

查看详情


所有数组元素满足条件(All array elements satisfy)

支持于:批处理、流处理

如果表达式对数组中的所有元素都为真,则返回 true。

表达式类别: 数组(Array)

输出类型: 布尔型(Boolean)

示例

参数值:

  • 数组: miles
  • 布尔条件:
    isNull(
     expression: element,
    )
miles 输出
[ 12300, null ] false
[ null, null ] true

查看详情


与(And)

支持于:批处理、快速处理、流处理

如果所有指定条件都为真,则返回 true。空值被视为 false。

表达式类别: 布尔型

输出类型: 布尔型

示例

参数值:

  • 条件: [left_boolean, right_boolean]
left_boolean right_boolean 输出
true true true
true false false
false true false
false false false

查看详情


任意数组元素满足条件(Any array element satisfy)

支持于:批处理、流处理

如果表达式对数组中的任意元素为真,则返回 true。

表达式类别: 数组

输出类型: 布尔型

示例

参数值:

  • 数组: miles
  • 布尔条件:
    isNull(
     expression: element,
    )
miles 输出
[ 12300, null ] true
[ 12300, 12000 ] false

查看详情


反余弦(Arccos)

支持于:批处理、快速处理、流处理

反余弦函数。

表达式类别: 数值型

输出类型: 双精度浮点型(Double)

示例

参数值:

  • 角度单位: radians
  • 值: 1.0

输出: 0.0

查看详情


反正弦(Arcsin)

支持于:批处理、快速处理、流处理

反正弦函数。

表达式类别: 数值型

输出类型: 双精度浮点型

示例

参数值:

  • 角度单位: radians
  • 值: 0.0

输出: 0.0

查看详情


反正切(Arctan)

支持于:批处理、快速处理、流处理

反正切函数。

表达式类别: 数值型

输出类型: 双精度浮点型

示例

参数值:

  • 角度单位: degrees
  • 值: angle
angle 输出
-1.0 -45.0
0.0 0.0
1.0 45.0

查看详情


反正切2(Arctan2)

支持于:批处理、快速处理、流处理

返回从原点到点 (x, y) 的射线与正 x 轴之间的角度 θ,范围限制在 −π<θ<=π。

表达式类别: 数值型

输出类型: 双精度浮点型

示例

参数值:

  • 角度单位: degrees
  • X: x
  • Y: y
y x 输出
0.0 0.0 0.0
1.0 0.0 90.0
0.0 -1.0 180.0
-1.0 0.0 -90.0

查看详情


面积(Area)

支持于:批处理、流处理

使用球面近似计算几何图形的面积,单位为平方米。对于线串或点,面积等于 0。

表达式类别: 地理空间(Geospatial)

输出类型: 双精度浮点型

查看详情


数组添加(Array add)

支持于:批处理、快速处理、流处理

在指定索引处向数组添加一个值。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 数组: numbers
  • 索引: 1
  • 值: 1
numbers 输出
[ 3, 5 ] [ 1, 3, 5 ]
[ 2 ] [ 1, 2 ]
[ ] [ 1 ]

查看详情


数组笛卡尔积(Array cartesian product)

支持于:批处理、流处理

计算数组的笛卡尔积。

表达式类别: 数组

输出类型: Array\

示例

参数值:

  • 表达式: [first, second]
first second 输出
[ 1, 2 ] [ 3, 4 ] [ {
first: 1,
second: 3,
}, {
first: 1,
 *second...

查看详情


数组连接(Array concat)

支持于:批处理、快速处理、流处理

将提供的数组连接成一个数组,不进行去重。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: [[ 1, 2, 3 ], [ 4, 5 ]]

输出: [ 1, 2, 3, 4, 5 ]

查看详情


数组包含(Array contains)

支持于:批处理、快速处理、流处理

如果数组包含该值,则返回 true。

表达式类别: 数组、布尔型

输出类型: 布尔型

示例

参数值:

  • 数组: part_ids
  • 值: BRR-123
part_ids 输出
[ AWE-112, BRR-123 ] true
[ AWE-222, ABC-543 ] false

查看详情


数组包含空值(Array contains null)

支持于:批处理、快速处理、流处理

如果 array 包含 null,则返回 true。

表达式类别: 数组、布尔型

输出类型: 布尔型

示例

参数值:

  • 表达式: part_ids
part_ids 输出
[ AWE-112, BRR-123, null ] true
[ AWE-222, ABC-543 ] false

查看详情


数组差集(Array difference)

支持于:批处理、快速处理、流处理

返回 left 数组中所有不在 right 数组中的唯一元素。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 左数组: [ 1, 2, 3 ]
  • 右数组: [ 2, 3, 4 ]

输出: [ 1 ]

查看详情


数组去重(Array distinct)

支持于:批处理、快速处理、流处理

移除重复项并返回数组中的不同值。

表达式类别: 数组

类型变量范围: T 接受可比较类型(ComparableType)

输出类型: Array\

示例

参数值:

  • 表达式: [ 1, 1, 2, 3 ]

输出: [ 1, 2, 3 ]

查看详情


数组元素(Array element)

支持于:批处理、快速处理、流处理

从输入数组中返回给定位置的元素。超出数组范围的位置将返回 null

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: T

示例

参数值:

  • 数组: [ 10, 11, 12 ]
  • 位置: 1

输出: 10

查看详情


数组元素互异(Array elements are distinct)

支持于:批处理、快速处理、流处理

如果数组的元素互不相同,则返回 true,否则返回 false。如果数组为 null,则返回 false。

表达式类别: 数组、布尔型

输出类型: 布尔型

示例

参数值:

  • 表达式: part_ids
part_ids 输出
[ ABC-123, DCE-123, EFG-123 ] true
[ ABC-123, ABC-123, EFG-123 ] false

查看详情


数组扁平化(Array flatten)

支持于:批处理、快速处理、流处理

通过合并第一层嵌套中的元素,从输入的嵌套数组创建一个单一数组。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: array
array 输出
[ [ 1, 2, 3 ], [ 4, 5, 6 ] ] [ 1, 2, 3, 4, 5, 6 ]

查看详情


数组交集(Array intersect)

支持于:批处理、快速处理、流处理

移除重复项并计算数组列表的交集。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: [[ 1, 2, 3 ], [ 3, 4 ]]

输出: [ 3 ]

查看详情


数组最大值(Array maximum)

支持于:批处理、快速处理、流处理

返回数组列的最大值。

表达式类别: 数组

类型变量范围: T 接受可比较类型

输出类型: T

示例

参数值:

  • 表达式: [ 1, 2, 3 ]

输出: 3

查看详情


数组最小值(Array minimum)

支持于:批处理、快速处理、流处理

返回数组列的最小值。

表达式类别: 数组

类型变量范围: T 接受可比较类型

输出类型: T

示例

参数值:

  • 表达式: [ 1, 2, 3 ]

输出: 1

查看详情


数组位置(Array position)

支持于:批处理、快速处理、流处理

返回给定数组中 'value' 首次出现的位置/索引。当未找到值或任何参数为 null 时,返回 null

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: 长整型(Long)

示例

参数值:

  • 数组: [ 10, 11, 12 ]
  • 值: 10

输出: 1

查看详情


数组移除(Array remove)

支持于:批处理、快速处理、流处理

从给定数组中移除所有提供的 'value' 后返回数组。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 数组: [ 1, 2, 3 ]
  • 值: 1

输出: [ 2, 3 ]

查看详情


数组重复(Array repeat)

支持于:批处理、快速处理、流处理

返回一个数组,其中 array 的内容被连接 value 次。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 数组: [ 1, 2 ]
  • 值: 2

输出: [ 1, 2, 1, 2 ]

查看详情


数组反转(Array reverse)

支持于:批处理、快速处理、流处理

反转 'array' 中元素的顺序。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: [ 1, 2, 3 ]

输出: [ 3, 2, 1 ]

查看详情


数组排序(Array sort)

支持于:批处理、快速处理、流处理

返回给定输入数组的排序数组。所有空值在降序数组中放在末尾,在升序数组中放在开头。

表达式类别: 数组

类型变量范围: T 接受可比较类型

输出类型: Array\

示例

参数值:

  • 方向: ASCENDING
  • 表达式: [ 5, 3, 6 ]

输出: [ 3, 5, 6 ]

查看详情


按结构体键排序数组(Array sort by struct key)

支持于:批处理、流处理

返回给定输入结构体数组的排序数组,按给定结构体键的值排序。

表达式类别: 数组

输出类型: Array\

示例

参数值:

  • 输入数组: [ {
    age: 20,
    }, {
    age: 10,
    }, {
    age: 30,
    } ]
  • 排序键: [(age, ASCENDING)]

输出: [ {
age: 10,
}, {
age: 20,
}, {
age: 30,
} ]

查看详情


数组并集(Array union)

支持于:批处理、快速处理、流处理

移除重复项并计算数组列表的并集。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: [[ 1, 2, 3 ], [ 3, 4 ]]

输出: [ 1, 2, 3, 4 ]

查看详情


数组有交集(Arrays have intersection)

支持于:批处理、快速处理、流处理

检查给定数组是否至少有一个共同元素。

表达式类别: 数组、布尔型

类型变量范围: T 接受任意类型

输出类型: 布尔型

示例

参数值:

  • 表达式: [[ 1, 2, 3 ], [ 3, 4 ]]

输出: true

查看详情


数组压缩(Arrays zip)

支持于:批处理、快速处理、流处理

将给定的数组列表压缩成一个合并的结构体数组,其中第 n 个结构体包含输入数组的所有第 n 个值。

表达式类别: 数组

输出类型: Array\

示例

参数值:

  • 表达式: [first_array, second_array]
first_array second_array 输出
[ 1, 2, 3 ] [ 4, 5, 6 ] [ {
first_array: 1,
second_array: 4,
}, {
first_array: 2,<...

查看详情


Base 64 解码为字符串(Base 64 decode to string)

支持于:批处理、快速处理、流处理

对给定表达式进行 Base64 解码。对二进制数据使用 utf-8 编码。

表达式类别: 二进制(Binary)、类型转换(Cast)、字符串(String)

输出类型: 字符串

示例

参数值:

  • 表达式: encoded
encoded 输出
Zm9v foo
YmFy bar

查看详情


Base64 解码(Base64 decode)

支持于:批处理、快速处理、流处理

对给定表达式进行 Base64 解码。

表达式类别: 二进制、类型转换

输出类型: 二进制(Binary)

示例

参数值:

  • 表达式: city_base64
city_base64 输出
TG9uZG9u TG9uZG9u
Q29wZW5oYWdlbg== Q29wZW5oYWdlbg==
TmV3IFlvcms= TmV3IFlvcms=

查看详情


Base64 编码(Base64 encode)

支持于:批处理、快速处理、流处理

对给定表达式进行 Base64 编码。

表达式类别: 二进制、类型转换

输出类型: 字符串

示例

参数值:

  • 表达式: city
city 输出
London TG9uZG9u
Copenhagen Q29wZW5oYWdlbg==
New York TmV3IFlvcms=

查看详情


按位左移(Bit shift left)

支持于:批处理、流处理

将给定值向左移动若干位。

表达式类别: 二进制

类型变量范围: E 接受 字节(Byte) | 整型(Integer) | 长整型(Long) | 短整型(Short)

输出类型: E

示例

参数值:

  • 表达式: 1
  • 位数: 1

输出: 2

查看详情


按位右移(Bit shift right)

支持于:批处理、流处理

将给定值向右移动若干位。

表达式类别: 二进制

类型变量范围: E 接受 字节 | 整型 | 长整型 | 短整型

输出类型: E

示例

参数值:

  • 表达式: 1
  • 位数: 1

输出: 0

查看详情


缓冲 H3 索引(Buffer H3 indices)

支持于:批处理、快速处理、流处理

从 H3 索引数组创建距离为 k 的缓冲区。

表达式类别: 地理空间

输出类型: Array\

查看详情


计算目标点(Calculate destination point)

支持于:批处理、快速处理、流处理

给定起点、航向和距离,计算沿指定路径的目标点。

表达式类别: 地理空间

输出类型: 地理点(GeoPoint)

示例

参数值:

  • 航向: course
  • 距离: distance
  • 起点: point_a
  • 计算方法: GREAT_CIRCLE
point_a course distance 输出
{
latitude: 48.8567,
longitude: 2.3508,
}
225.0 32000.0 {
latitude: 48.65279552300661,
longitude: 2.0427666779658806,
}

查看详情


计算哈弗辛距离(Calculate haversine distance)

支持于:批处理、快速处理、流处理

计算两个经纬度点对之间的哈弗辛距离,单位为米。

表达式类别: 地理空间

输出类型: 双精度浮点型

示例

参数值:

  • 点 a: point_a
  • 点 b: point_b
point_a point_b 输出
{
latitude: 41.507483,
longitude: -99.436554,
}
{
latitude: 38.504048,
longitude: -98.315949,
}
347328.82778977347
{
latitude: 22.308919,
longitude: 113.914603,
}
{
latitude: -33.946111,
longitude: 151.177222,
}
7393894.00134442

查看详情


条件分支(Case)

支持于:批处理、快速处理、流处理

根据条件在不同分支之间进行选择。

表达式类别: 常用(Popular)

类型变量范围: T 接受任意类型

输出类型: T

示例

参数值:

  • 默认值: Yes
  • 分支: [(
    lessThan(
     left: miles,
     right: 15000,
    ), No)]
miles 输出
20053 Yes
10210 No
34120 Yes

查看详情


类型转换(Cast)

支持于:批处理、快速处理、流处理

将表达式转换为给定类型。

表达式类别: 类型转换、常用

类型变量范围: C 接受任意类型

输出类型: C

示例

描述: 将长整型转换为字符串

参数值:

  • 表达式: 1234
  • 类型: String

输出: 1234

查看详情


转换媒体模式(Cast media schema)

支持于:

将媒体引用转换为特定的媒体模式和格式。当输入媒体具有通用模式(多模态)但实际内容已知为特定类型(如 png 图像)时,此功能非常有用。转换会缩小类型元数据,以允许需要特定模式类型的下游操作。

表达式类别: 媒体(Media)

输出类型: 媒体引用(Media reference)

查看详情


向上取整(Ceil)

支持于:批处理、快速处理、流处理

返回给定小数值的向上取整结果。

表达式类别: 数值型

输出类型: 十进制(Decimal) | 长整型

示例

参数值:

  • 表达式: 10.123

输出: 11

查看详情


更改时间戳时区(Change timestamp time zone)

支持于:批处理、快速处理

更改时间戳的时区。

表达式类别: 日期时间

输出类型: 时间戳(Timestamp)

示例

参数值:

  • 输出时区: America/Chicago
  • 时间戳: 2020-04-28T05:09:00Z
  • 输入时区: US/Eastern

输出: 2020-04-28T04:09:00Z

查看详情


按字符翻译字符串(Character-wise translate string)

支持于:批处理、快速处理、流处理

将输入列中在匹配字符串中找到的单个字符替换为替换字符串中的对应字符。如果匹配字符串比替换字符串长,则匹配字符串末尾的字符将被丢弃。

表达式类别: 字符串

输出类型: 字符串

示例

参数值:

  • 表达式: translate
  • 匹配字符串: rnlt
  • 替换字符串: 123

输出: 1a2s3ae

查看详情


分块字符串(Chunk string)

支持于:批处理、流处理

将字符串按指定大小和指定分隔符分块。

表达式类别: 字符串

输出类型: Array\

示例

参数值:

  • 表达式: string
  • 块重叠: null
  • 块大小: 10
  • 保留分隔符: null
  • 分隔符: null
string 输出
hello [ hello ]
hello world. the quick brown fox jumps over the fence. [ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]
hello world.
the quick brown fox
jumps over the fence.
[ hello, world., the quick, brown fox, jumps, over the, fence. ]

查看详情


密码解密(Cipher decrypt)

支持于:批处理、快速处理、流处理

使用密码解密表达式。

表达式类别: 其他(Other)

输出类型: 字符串

示例

参数值:

  • 密码许可证 rid: ri.bellaso.main.cipher-license.1-decrypt
  • 表达式: string
string 输出
CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER bar

查看详情


密码加密(Cipher encrypt)

支持于:批处理、快速处理、流处理

使用密码加密表达式。

表达式类别: 其他

输出类型: 密码文本(Cipher Text)

示例

参数值:

  • 密码许可证 rid: ri.bellaso.main.cipher-license.1-encrypt
  • 表达式: string
string 输出
bar CIPHER::ri.bellaso.main.cipher-channel.1::OCRBIW3iHDltOGa6MEHwb7f/Dw==::CIPHER

查看详情


密码哈希(Cipher hash)

支持于:批处理、快速处理、流处理

使用密码对表达式进行哈希处理。

表达式类别: 其他

输出类型: 密码文本

示例

参数值:

  • 密码许可证 rid: ri.bellaso.main.cipher-license.1-hash
  • 表达式: string
string 输出
bar CIPHER::ri.bellaso.main.cipher-channel.1::c70a14f5cc57c940e3265045a5554d641bd549ee27a571a05cdbc75c77762eb86b1144c12f1bb7811a0bcec08b2f143989c44022e4664f615d6885ad640332cb::CIPHER

查看详情


清理字符串(Clean string)

支持于:批处理、快速处理、流处理

对表达式应用一组清理操作。

表达式类别: 数据准备(Data preparation)、字符串

输出类型: 字符串

示例

参数值:

  • 清理操作: {trim}
  • 表达式: hello world

输出: hello world

查看详情


压缩一组 H3 索引(Compact a set of H3 indices)

支持于:批处理、快速处理、流处理

如果可能,将 H3 索引压缩为混合分辨率的子集。如果输入索引都具有相同的分辨率,则运行逆操作解压缩(uncompact)保证产生与压缩时相同的索引集。如果任何输入索引无效,此转换将返回 null。输出索引按升序排序。

表达式类别: 地理空间

输出类型: Array\

示例

参数值:

  • H3 索引: h3_set
h3_set 输出
[ 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffffff, 87754a934ffff... [ 86754e64fffffff, 87754a914ffffff, 87754a916ffffff, 87754a930ffffff, 87754a932ffffff, 87754a933ffff...

查看详情


连接字符串(Concatenate strings)

支持于:批处理、快速处理、流处理

使用指定的分隔符连接字符串列表。

表达式类别: 字符串

输出类型: 字符串

示例

参数值:

  • 表达式: [hello, world]
  • 如果任何输入为 null 则输出 null: null
  • 分隔符: _

输出: hello_world

查看详情


构造委托媒体 Gotham 标识符(Construct delegated media Gotham identifier (GID))

支持于:批处理、流处理

从组件构造有效的委托媒体 Gotham 标识符(GID)的表达式。如果结果超过 1024 个字符,则产生 null 行。

表达式类别: 其他

输出类型: 委托媒体 Gotham 标识符(Delegated media Gotham identifier (GID))

示例

参数值:

  • 媒体定位器: locator
  • 媒体类型: mediaType
  • 生产者实例: invalidUuid
mediaType locator 输出
testaudiotype 空字符串 null

查看详情


将 DMS 转换为地理点(Convert DMS to GeoPoint)

支持于:批处理、流处理

将度、分、秒(DMS)格式的地理空间坐标字符串根据用户提供的格式转换为地理点(GeoPoint)。默认格式为 DDD*°MM*'SS*"HDDD*MMSSssH。格式按顺序运行,将返回第一个匹配的格式。有关如何编写用户生成格式的指南,请参阅格式指南。

表达式类别: 地理空间

输出类型: 地理点

示例

参数值:

  • 坐标: coordinates
  • 格式: null
coordinates 输出
078261594N075220923E {
latitude: 78.43776111111112,
longitude: 75.36923055555555,
}
046115095S069524119W {
latitude: -46.19748611111111,
longitude: -69.87810833333333,
}
023°45'55"N 069°52'11"W {
latitude: 23.76527777777777,
longitude: -69.86972222222222,
}
-123°55'55"N 069°53'00"W {
latitude: -123.93194444444445,
longitude: -69.88333333333334,
}
123456789N23456789E {
latitude: 123.76885833333333,
longitude: 23.768858333333334,
}

查看详情


将地理点转换为 DMS(Convert GeoPoint to DMS)

支持于:批处理、快速处理、流处理

将地理点(GeoPoint)根据用户选择的格式转换为度、分、秒(DMS)格式的地理空间坐标字符串。可能的格式为 DDD°MM'SS"HDDDMMSSssH

表达式类别: 地理空间

输出类型: 字符串

查看详情


将地理点转换为地理哈希(Convert GeoPoint to Geohash)

支持于:批处理、快速处理、流处理

将地理点(GeoPoint)转换为包含该地理点的、具有指定精度的 base32 编码地理哈希(Geohash)。有关地理哈希的更多信息,请参阅:https://en.wikipedia.org/wiki/Geohash 。

表达式类别: 地理空间

输出类型: 地理哈希(Geohash)

查看详情


将地理点转换为 MGRS(Convert GeoPoint to MGRS)

支持于:批处理、快速处理、流处理

将遵循 WGS84 坐标系(即 EPSG:4326)的地理点(GeoPoint)转换为 MGRS(军事网格参考系统)坐标。输出的 MGRS 将采用空格分隔的格式,精度为 5 位数字。

表达式类别: 地理空间

输出类型: MGRS

示例

参数值:

  • 表达式: geoPoint
geoPoint 输出
{
 latitude -> 88.99999659707431,
 longitude -> 0.9996456505181999,
}
Z AF 01937 88990

查看详情


将地理点转换为几何图形(Convert GeoPoint to geometry)

支持于:批处理、快速处理、流处理

将地理点(GeoPoint)转换为点类型的 GeoJSON。

表达式类别: 地理空间

输出类型: 几何图形(Geometry)

查看详情


将 H3 索引转换为地理点(Convert H3 index to GeoPoint)

支持于:批处理、快速处理、流处理

将 H3 索引转换为表示相应 H3 六边形中心的地理点(GeoPoint)。

表达式类别: 地理空间

输出类型: 地理点

查看详情


将 MGRS 转换为地理点(Convert MGRS to GeoPoint)

支持于:批处理、快速处理、流处理

将 MGRS(军事网格参考系统)坐标转换为遵循 WGS84 坐标系(即 EPSG:4326)的地理点(GeoPoint)。

表达式类别: 地理空间

输出类型: 地理点

示例

参数值:

  • 表达式: mgrs
mgrs 输出
ZAF0193788990 {
latitude: 88.99999659707431,
longitude: 0.9996456505181999,
}

查看详情


将字符串转换为日期(Convert a string to date)

支持于:批处理、快速处理、流处理

根据 Java DateTimeFormatter 返回给定格式化字符串的日期。默认格式为 yyyy-MM-ddyyyy-MM-dd'T'HH:mm:ss.SSSXXX。格式按顺序运行,将返回第一个匹配的格式。

表达式类别: 类型转换、日期时间

输出类型: 日期

示例

描述: 日期格式是可选的

参数值:

  • 字符串: 2020-04-28
  • 格式: null

输出: 2020-04-28

查看详情


将字符串转换为时间戳(Convert a string to timestamp)

支持于:批处理、快速处理、流处理

根据 Java DateTimeFormatter 返回给定格式化字符串的时间戳。默认格式为 yyyy-MM-dd'T'HH:mm:ss.SSSXXXyyyy-MM-dd。格式按顺序运行,将返回第一个匹配的格式。

表达式类别: 类型转换、日期时间

输出类型: 时间戳

示例

参数值:

  • 字符串: timestamp
  • 格式: [dd-yyyy-MM HH\:mm:ss, yyyy-MM-dd]
  • 时区: null
timestamp 输出
28-2020-04 10:09:00 2020-04-28T10:09:00Z
2020-04-28 2020-04-28T00:00:00Z

查看详情


转换进制(Convert base)

支持于:批处理、流处理

将数字(或其字符串表示形式)从一种进制转换为另一种进制。

表达式类别: 二进制、类型转换、数值型

输出类型: 字符串

示例

参数值:

  • 表达式: 4A801
  • 从进制: 16
  • 到进制: 10

输出: 305153

查看详情


角度单位转换(Convert between angle units)

支持于:批处理、快速处理、流处理

表达式类别: 地理空间、数值型

输出类型: 双精度浮点型

查看详情


距离单位转换(Convert between distance units)

支持于:批处理、快速处理、流处理

表达式类别: 数值型

输出类型: 双精度浮点型

查看详情


时间单位转换(Convert between time units)

支持于:批处理、快速处理、流处理

表达式类别: 日期时间

输出类型: 双精度浮点型

查看详情


重量单位转换(Convert between weight units)

支持于:批处理、快速处理、流处理

表达式类别: 数值型

输出类型: 双精度浮点型

查看详情


将数据转换为 JSON(Convert data to JSON)

支持于:批处理、快速处理、流处理

将输入转换为 json 字符串。

表达式类别: 文件(File)、字符串

输出类型: 字符串

示例

参数值:

  • 输入: struct
struct 输出
{
airline: {
id: NA,
},
}
{"airline":{"id":"NA"}}

查看详情


从本体论地理点转换(Convert from Ontology GeoPoint)

支持于:批处理、快速处理、流处理

将本体论地理点(Ontology GeoPoint)转换为常规地理点。本体论地理点是格式为 '{lat},{lon}' 的字符串,其中 -90 <= lat <= 90 且 -180 <= lon <= 180。常规地理点是格式为 {"longitude": {long},"latitude": {lat}} 的结构体。

表达式类别: 地理空间

输出类型: 地理点

示例

参数值:

  • 表达式: geopoint
geopoint 输出
-20.0000000,80.0000000 {
latitude: -20.0,
longitude: 80.0,
}
38.9031000,-77.0599000 {
latitude: 38.9031,
longitude: -77.0599,
}
41.9876543,-99.1234568 {
latitude: 41.9876543,
longitude: -99.1234568,
}

查看详情


从十六进制转换(Convert from hexadecimal)

支持于:批处理、快速处理

hex 的逆运算。将每对字符解释为十六进制数,并转换为该数字的字节表示形式。

表达式类别: 数值型、字符串

输出类型: 二进制

示例

参数值:

  • 表达式: string_hex
string_hex 输出
68656C6C6F aGVsbG8=
3039 MDk=
FFFFFFFFFFFFCFC7 ////////z8c=
4C6F6E646F6E TG9uZG9u

查看详情


从十六进制转换为字符串(Convert from hexadecimal to string)

支持于:批处理、快速处理、流处理

hex 的逆运算,将每对字符解释为十六进制数,并转换为该数字的字节表示形式的 utf-8 字符串。

表达式类别: 字符串

输出类型: 字符串

示例

参数值:

  • 表达式: string_hex
string_hex 输出
68656C6C6F hello
4C6F6E646F6E London

查看详情


将地心坐标转换为 WGS 84 大地坐标(Convert geocentric coordinates to WGS 84 geodesic coordinates)

支持于:批处理、流处理

将地心笛卡尔坐标(也称为地心地固坐标系或 ECEF 坐标)转换为大地极坐标。海拔定义为椭球体上方高度。如果任何坐标为 null,则输出将为 null。

表达式类别: 地理空间

输出类型: 带海拔的地理点(GeoPoint with altitude)

示例

参数值:

  • X 坐标: x_coordinate
  • Y 坐标: y_coordinate
  • Z 坐标: z_coordinate
x_coordinate y_coordinate z_coordinate 输出
0.0 6378137.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 90.0,
},
}
0.0 -6378137.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -90.0,
},
}
-6378137.0 0.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> 180.0,
},
}
-6378137.0 -0.0 0.0 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 0.0,
 longitude -> -180.0,
},
}
0.0 0.0 6356752.314245179 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> 90.0,
 longitude -> 0.0,
},
}
0.0 0.0 -6356752.314245179 {
 altitude -> 0.0,
 geoPoint -> {
 latitude -> -90.0,
 longitude -> 0.0,
},
}

查看详情


转换旧版 OffsetDateTime(Convert legacy OffsetDateTime)

支持于:批处理

将旧版 OffsetDateTime 列转换为可在所有 Foundry 管道中使用的时间戳。时间戳以 UTC 返回。

表达式类别: 日期时间

输出类型: 时间戳

查看详情


将线串转换为多边形(Convert linestring to polygon)

支持于:批处理、快速处理、流处理

将线串几何图形转换为多边形几何图形。此表达式假定线串几何图形是闭合的。如果不是,表达式将返回 null。

表达式类别: 地理空间

输出类型: 几何图形

示例

参数值:

  • 表达式: polygon_points
polygon_points 输出
{"type":"LineString","coordinates":[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]} {"type":"Polygon","coordinates":[[[-77.49,38.01],[-77.47,38.15],[-77.19,38.14],[-77.49,38.01]]]}

查看详情


从 UTC 转换时间戳(Convert timestamp from UTC)

支持于:批处理、快速处理、流处理

将时间戳从 UTC 转换为给定时区。

表达式类别: 日期时间

输出类型: 时间戳

示例

参数值:

  • 时区: EST
  • 时间戳: 2020-04-28T10:09:00Z

输出: 2020-04-28T05:09:00Z

查看详情


将时间戳转换为 UTC(Convert timestamp to UTC)

支持于:批处理、快速处理、流处理

根据给定时区将时间戳转换为 UTC。

表达式类别: 日期时间

输出类型: 时间戳

示例

参数值:

  • 时区: EST
  • 时间戳: 2020-04-28T10:09:00Z

输出: 2020-04-28T15:09:00Z

查看详情


转换为本体论地理点(Convert to Ontology GeoPoint)

支持于:批处理、快速处理、流处理

将地理点(GeoPoint)转换为本体论(Ontology)将接受的用于地理索引列(地理哈希类型列)的字符串。本体论地理点是格式为 '{lat},{lon}' 的字符串,其中 -90 <= lat <= 90 且 -180 <= lon <= 180。

表达式类别: 地理空间

输出类型: 本体论地理点(Ontology GeoPoint)

示例

参数值:

  • 表达式: point
point 输出
{
latitude: -20.0,
longitude: 80.0,
}
-20.0000000,80.0000000
{
latitude: 38.9031,
longitude: -77.0599,
}
38.9031000,-77.0599000
{
latitude: 41.987654321,
longitude: -99.123456789,
}
41.9876543,-99.1234568
null null

查看详情


转换为十六进制(Convert to hexadecimal)

支持于:批处理、快速处理、流处理

计算给定表达式的十六进制值。

表达式类别: 数值型、字符串

输出类型: 字符串

示例

参数值:

  • 表达式: city_hex
city_hex 输出
TG9uZG9u 4C6F6E646F6E

查看详情


转换为八进制(Convert to octal)

支持于:批处理、快速处理、流处理

计算给定表达式的八进制值。

表达式类别: 数值型

输出类型: 字符串

示例

参数值:

  • 表达式: 12345

输出: 30071

查看详情


余弦(Cosine)

支持于:批处理、快速处理、流处理

计算角度的余弦值。

表达式类别: 数值型

输出类型: 双精度浮点型

示例

参数值:

  • 角度单位: degrees
  • 角度值: angle
angle 输出
0.0 1.0
90.0 0.0
180.0 -1.0

查看详情


创建地理点(Create GeoPoint)

支持于:批处理、快速处理、流处理

从纬度和经度列创建地理点(GeoPoint)列。验证纬度参数是否在 -90 到 90 之间(含),经度参数是否在 -180 到 180 之间(含);如果不是,则返回 null 值。

表达式类别: 地理空间

输出类型: 地理点

查看详情


从坐标系创建地理点(Create GeoPoint from coordinate system)

支持于:批处理、流处理

从源坐标系获取一对坐标,并将其转换为 WGS 84 纬度/经度值。坐标系(也称为坐标参考系统或空间参考系统)表示用于识别地球上点位置的不同系统,通常通过标准化数据库(如 EPSG)中的键来标识。如果给定的投影不受支持或任一坐标为 null,则返回 null。此表达式适用于高级用户。如果不需要处理坐标系,建议使用"创建地理点"表达式。

表达式类别: 地理空间

输出类型: 地理点

示例

参数值:

  • 源坐标系: EPSG:32618
  • X 坐标: x_coordinate
  • Y 坐标: y_coordinate
x_coordinate y_coordinate 输出
322190.2233952965 4306505.703879281 {
 latitude -> 38.88944258,
 longitude -> -77.05014581,
}
323243.1361536059 4318298.06539618 {
 latitude -> 38.99585379643137,
 longitude -> -77.04105678275415,
}
407063.63465300016 4764873.719585404 {
 latitude -> 43.03086518778498,
 longitude -> -76.14077251822197,
}

查看详情


创建空数组(Create an empty array)

支持于:批处理、快速处理、流处理

返回给定类型的空数组。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 类型: String

输出: [ ]

查看详情


创建数组(Create array)

支持于:批处理、快速处理、流处理

从提供的列创建数组。

表达式类别: 数组

类型变量范围: T 接受任意类型

输出类型: Array\

示例

参数值:

  • 表达式: [1, 2, 3]

输出: [ 1, 2, 3 ]

查看详情


创建椭圆几何图形(Create ellipse geometry)

支持于:批处理、流处理

将椭圆近似为以给定地理坐标为中心的多边形。点之间的距离沿近似地球表面的 WGS84 椭球体表面计算。

表达式类别: 地理空间

输出类型: 几何图形

查看详情


创建测地线串(Create geodesic line string)

支持于:批处理、流处理

在两点之间创建测地线。

表达式类别: 地理空间

输出类型: 几何图形

查看详情


创建地理时间序列引用(Create geotemporal series reference)

支持于:批处理、流处理

生成地理时间序列引用对象属性类型所需的值,该属性类型包含对一系列地理时间观测的引用以及包含该序列的地理时间序列集成的 RID。

表达式类别: 地理空间、其他、字符串

输出类型: 地理时间序列引用(Geotemporal series reference)

查看详情


创建线串几何图形(Create linestring geometry)

支持于:批处理、流处理

从给定点创建 GeoJSON 线串几何图形。

表达式类别: 地理空间

类型变量范围: T 接受 Struct\

输出类型: 几何图形

示例

参数值:

  • 点: points
points 输出
[ {
latitude: 10.0,
longitude: 0.0,
}, {
latitude: 10.0,
longitude: 10.0,
} ]
{"type":"LineString","coordinates":[[0.0,10.0],[10.0,10.0]]}
[ {
latitude: 10.0,
longitude: 10.0,
}, {
latitude: 20.0,<...
{"type":"LineString","coordinates":[[10.0,10.0],[20.0,20.0],[30.0,30.0]]}
[ {
latitude: 0.0,
longitude: 179.0,
}, {
latitude: 0.0,
longitude: 181.0,
} ]
{"type":"MultiLineString","coordinates":[[[179.0,0.0],[180.0,0.0]],[[-180.0,0.0],[-179.0,0.0]]]}
[ {
latitude: 0.0,
longitude: -179.0,
}, {
latitude: 0.0,
longitude: -181.0,
} ]
{"type":"MultiLineString","coordinates":[[[180.0,0.0],[179.0,0.0]],[[-179.0,0.0],[-180.0,0.0]]]}

查看详情


从数组创建映射(Create map from arrays)

支持于:批处理、快速处理、流处理

使用压缩数组中的键值对返回映射。不允许将空值作为键,否则将导致运行时错误。

表达式类别: 数组、映射

类型变量范围: K 接受任意类型**V 接受任意类型

输出类型: Map\

示例

参数值:

  • 键数组: [ 1, 2, 3 ]
  • 值数组: [ 4, 5, 6 ]

输出: {
 1 -> 4,
 2 -> 5,
 3 -> 6,
}

查看详情


创建空值(Create null value)

支持于:批处理、快速处理、流处理

返回给定类型的空值。

表达式类别: 数据准备

类型变量范围: T 接受任意类型

输出类型: T

示例

参数值:

  • 类型: String

输出: null

查看详情


创建范围扇形几何图形(Create range fan geometry)

支持于:批处理、流处理

将范围扇形近似为多边形,指定所有点的区域,这些点到原点的哈弗辛距离在最小和最大半径之间,并且从原点的方位角包含在围绕指定方位角参数的角范围内。范围扇形的左侧和右侧绘制为沿近似地球表面的 WGS84 椭球体表面计算的测地线。如果范围跨度超过 180 度且同时穿过反子午线,或者最大半径跨度超过地球周长的一半,则返回 null。

表达式类别: 地理空间

输出类型: 几何图形

查看详情


创建结构体列(Create struct column)

支持于:批处理、快速处理、流处理

将多个列组合成一个单一的结构化列。

表达式类别: 结构体

输出类型: 结构体

示例

参数值:

  • 结构体元素: [tail_number, id]
tail_number id 输出
MT-112 1 {
id: 1,
tail_number: MT-112,
}
XB-123 2 {
id: 2,
tail_number: XB-123,
}
P

[... 截断以进行翻译 ...]