Syntax and supported functions(语法与支持的函数)¶
For an orientation to the Expression board, see the guide on using the expression board. This document is a resource for using Contour's rich expression language and can be used as a reference for types, operations and functions.
Data types¶
You can use the following data types in expressions:
String¶
Strings are expressed in single quotes:
'This is a string'
This is a string
Escape single quotes within the string by adding a second single quote:
'I''m a string'
I’m a string
Concatenate columns as strings with the || operator:
"Gender" || "MaritalStatus"
You can also concatenate columns with constants.
'$' || ("salaryColumn"/1000.0) || 'k'
Integer¶
Integers are non-floating point numbers:
5
Double¶
Doubles are floating point numbers:
5.7
Boolean¶
Boolean variables can be either true or false. Boolean variables are case-insensitive.
Date¶
You can cast strings in the format YYYY-MM or YYYY-MM-DD as dates:
CAST('2016-12-06' AS DATE)
You can also cast strings as timestamps:
CAST('2016-12-06` `01:12:34' AS TIMESTAMP)
When performing operations with timestamps, it is easiest to first cast the timestamp to a long, or to a double if you need millisecond precision. This gives you the number of seconds in UNIX time.
For example, say you have two timestamp columns: “start” and “end.” To determine the number of minutes between the two times, you can use the following expression:
(CAST("end" as LONG) - CAST("start" as LONG)) / 60.
Casting longs to timestamps assumes that that long values are in seconds. If your data is in milliseconds, divide by 1000 prior to casting.
Operations¶
The following sections give an overview over the operations that you can use in expressions and their precedence:
- Arithmetic
- Comparison
- Equality comparison
- Boolean comparison
- Casting
- Case/When/End
- Like
- Null checks
- Precedence
Arithmetic¶
Add, subtract, multiply, and divide numeric columns (integers or doubles) by other numeric columns or by constants: +, -, *, /
"tipAmount" / "fare""diameter" * 3.14
Comparison¶
Compare columns of any type against each other and against constants with > (greater than), >= (greater than or equal to), < (less than), <= (less than or equal to). Results in a column of booleans.
"age" > "averageAge""totalDistance" < 5
Equality comparison¶
Check whether a row has the same value in two columns with ==, or different values with !=.
"cityOfBirth" == "cityOfResidence"
You can also compare column values with constants.
"Gender" == 'M'
:::callout{theme="neutral" title="String comparisons"} When comparing two strings, lexicographical ordering is used. Note that this can be counterintuitive when comparing strings that contain numbers. For example:
'Alligator' < 'Boat'=> TRUE because the encoding of 'A' is less than that of 'B''Alliance' < 'Alligator'=> TRUE because 'a' is less than 'g''Zoo' < 'alpha'=> TRUE because 'Z' is less than 'a''Zoo' < 'Zoologist'=> TRUE because if one string is the prefix of the other, the shorter one is considered to be lesser'1' < '2'=> TRUE because the string character '1' is less than the string character '2''10' < '2'=> TRUE because its first string character '1' is less than the string character '2'; note that this is a comparison of the strings '10' and '2', not the integers 10 and 2'10' < '20'=> TRUE because '1' is less than '2' :::
:::callout{theme="neutral" title="Behavior differences with Spark SQL"}
In the Contour expression language, equality comparisons with NULL have the below behavior.
NULL = 'DATA'=> FALSENULL != 'DATA'=> TRUENULL = NULL=> TRUENULL != NULL=> FALSE
However, in Spark SQL, all of these expressions return NULL. When translating between Contour expressions and Spark SQL, it is important to consider this difference in behavior in order to produce consistent results.
:::
Boolean comparison¶
Derive a new column of booleans by evaluating one or more columns with boolean logic.
"Age" >=70 AND "Gender" = 'M'
Casting¶
Casting lets you change the type of a column or expression. You can cast to booleans, integers, doubles, dates, and timestamps.
CAST("startDate" AS DATE)CAST("startTime" AS TIMESTAMP)CAST("numParticipants" AS INTEGER)
Case/When/End¶
Case statements let you evaluate multiple possibilities in one expression. Each statement is evaluated in order, and the THEN statement is executed for the first one to evaluate to true.
CASE
WHEN "age" < 13 THEN 'Child'
WHEN "age" > 19 THEN 'Adult'
ELSE 'Teenager'
END
CASE MaritalStatus
WHEN 'S' THEN 'Single'
WHEN 'M' THEN 'Married'
ELSE 'Unknown'
END
Like¶
Search for a specified pattern in a column. See the documentation on SQL LIKE ↗ to learn more.
Null Checks¶
Check whether a value IS NULL or IS NOT NULL. Returns a boolean.
Precedence¶
The precedence of operations within expressions is detailed in the following list, from highest precedence to the lowest. Operations that are on the same line all have the same precedence.
- +, -, NOT (unary)
- || (binary)
- *, /, % (binary)
- +, - (binary)
- <, <=, >, >= (binary)
- \=, ==, !=, <>, IS, IS NOT, LIKE, RLIKE (binary)
- IS NULL, NOT NULL (unary)
- IN, NOT IN
- AND (binary)
- OR (binary)
Functions¶
The following is a complete list of functions available in column expressions.
- Math & numerical functions
- String functions
- Date/time functions
- Array functions
- Window functions
- Aggregate functions
- Misc functions
Math & numerical functions¶
- ABS: Computes the absolute value.
- CBRT: Computes the cube-root of the given value.
- CEIL: Computes the ceiling of the given value.
- COS: Computes the cosine of the given value.
- EXP: Computes the exponential of the given value.
- FACTORIAL: Computes the factorial of the given value.
- FLOOR: Computes the floor of the given value.
- FORMAT_NUMBER: Formats numeric column arg0 to a format like ‘#,###,###.##’, rounded to arg1 decimal places, and returns the result as a string column (note: arg1 must be an integer literal).
- ISNAN: Return true if, and only if, the column is NaN.
- LN: Computes the natural logarithm of the given value.
- LOG: Computes the logarithm of the arg1 with a base of the arg0 (note: arg0 must be an numeric literal).
- POW: Returns the value of the arg0 raised to the power of arg1.
- RAND: Returns a random number between 0 and 1 for each row, based on the provided seed arg0.
- ROUND: Returns the value of arg0 rounded to arg1 decimal places (note: arg1 must be an integer literal).
- RTRIM: Trim the spaces from right end for the specified string value.
- SIN: Computes the sine of the given column.
- SQRT: Computes the square root of the specified float value.
- TAN: Computes the tangent of the given value.
String functions¶
- CONCAT: Concatenates multiple input string columns together into a single string column.
- CONCAT_WS: Concatenates multiple input string columns together into a single string column, using the given separator (note: arg0 must be a string literal).
- FORMAT_STRING: Formats the arguments according to Java Formatter ↗ and returns the result as a string column.
- LENGTH: Computes the length of a given string or binary column.
- LEVENSHTEIN: Computes the Levenshtein distance of the two given string columns.
- LOWER: Converts a string column to lower case.
- LPAD: Returns the string column arg0, left-padded up to length arg1, with the string arg2 (note: arg1 must be an integer literal and arg2 must be a string literal).
- LTRIM: Trim the spaces from left end for the specified string value.
- REGEXP_EXTRACT: Match string column arg0 against regex arg1 and extract a specific group arg2 (1-indexed). If the regex did not match, or the specified group did not match, an empty string is returned. (note: arg1 must be a string literal and arg2 must be an integer literal).
- REGEXP_REPLACE: Replace all substrings in arg0 that match arg1 with arg2.
- REVERSE: Reverses the string column and returns it as a new string column.
- RPAD: Returns the string column arg0, right-padded up to length arg1, with the string arg2 (note: arg1 must be an integer literal and arg2 must be a string literal).
- SPLIT: Splits string column arg0 around regex string arg1 (note: arg1 must be a string literal).
- SUBSTRING: Substring of arg0, starting at index arg1 (1-indexed) and of length arg2 (note: arg1 and arg2 must be integer literals).
- TRIM: Trim the spaces from both ends for the specified string column.
- UPPER: Converts a string column to upper case.
Date/time functions¶
More info and examples on deriving relative date is available in the reference documentation on deriving relative dates.
- ADD_MONTHS: Returns the date that is arg1 months after arg0 (note: arg1 must be an integer literal)
- CURRENT_DATE: Returns the current date as a date column. These values are updated only when the path is recalculated. To avoid inconsistent values across boards, updating the path daily is recommended.
- CURRENT_TIMESTAMP: Returns the current timestamp as a timestamp column. These values are updated only when the path is recalculated.
- CURRENT_UNIX_TIMESTAMP: Returns the current timestamp as number in seconds. These values are updated only when the path is recalculated.
- DATE_ADD: Returns the date that is arg1 days after arg0 (note: arg1 must be an integer literal).
- DATE_FORMAT: Converts a date/timestamp arg0 to a string in the format specified by arg1 (note: arg1 must be a string literal).
- DATE_SUB: Returns the date that is arg1 days before arg0 and according to Java SimpleDateFormat ↗ (note: arg1 must be an integer literal).
- DATE_TRUNC: Returns the date/timestamp arg1 truncated to the unit specified by the format arg0.
- DATE_DIFF: Returns the number of days from arg1 to arg0.
- DAY_OF_MONTH: Extracts the day of the month as an integer from a given date/timestamp/string.
- DAY_OF_WEEK: Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday.
- DAY_OF_YEAR: Extracts the day of the year as an integer from a given date/timestamp/string.
- FROM_UTC_TIMESTAMP: Given a timestamp column arg0, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone arg1. (note: arg1 must be a string literal).
- HOUR: Extracts the hours as an integer from a given date/timestamp/string.
- LAST_DAY: Given a date column, returns the last day of the month which the given date belongs to.
- MINUTE: Extracts the minutes as an integer from a given date/timestamp/string.
- MONTH: Extracts the month as an integer from a given date/timestamp/string.
- MONTHS_BETWEEN: Returns number of months between dates arg0 and arg1.
- QUARTER: Extracts the quarter as an integer from a given date/timestamp/string.
- SECOND: Extracts the seconds as an integer from a given date/timestamp/string.
- TO_UNIX_TIMESTAMP: Convert time string arg0 with given pattern string arg1, return null if fail (note: arg1 must be a string literal).
- TO_UTC_TIMESTAMP: Given a timestamp column arg0, which corresponds to a certain time of day in the given timezone arg1, returns another timestamp that corresponds to the same time of day in UTC. (note: arg1 must be a string literal).
- WEEK_OF_YEAR: Extracts the ISO week ↗ number as an integer from a given date/timestamp/string.
- YEAR: Extracts the year as an integer from a given date/timestamp/string.
Array functions¶
More info and examples on Array functions is available in the reference documentation on array functions.
- ARRAY: Creates an array of all the input columns.
- ARRAY_CONTAINS: Returns true if the array column arg0 contains value arg1 (note: arg1 must be a literal).
- ARRAY_GET_AT_INDEX: Returns the element at index arg1 (1-indexed) from array column arg0 (note: arg1 must be an integer literal). Explicitly cast the result of this function to use the column in subsequent boards.
- ARRAY_LENGTH: Returns the length of the given array. Returns -1 when the input is
null. - ARRAY_JOIN: Returns a string of the array column arg0 joined by the separator string arg1 (note: arg1 must be a string literal).
- ARRAY_SORT: Sorts the input array arg0 in ascending order if arg1 is true, or descending order if arg1 is false (note: arg1 must be a boolean literal).
:::callout{theme="warning" title="Warning"}
When using ROW_NUMBER, FIRST, LAST, ARRAY_AGG, or ARRAY_AGG_DISTINCT, in a window function, be careful of nondeterminism. Imagine we are partitioning by column A and ordering by column B. If for the same value of column A, there are multiple rows with the same value of column B, the results of these window functions may be non-deterministic -- they may produce different results given the same input data and logic.
:::
Window functions¶
Window functions need an OVER clause; more info and examples in the reference documentation on window functions.
- CUME_DIST: Returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
- DENSE_RANK: Returns the rank of rows within a window partition, without any gaps.
- EXPLODE: Creates a new row for each element in the given array column, dropping rows with nulls. Explicitly cast the result of this function to use the column in subsequent boards.
- EXPLODE_OUTER: Creates a new row for each element in the given array column. Explicitly cast the result of this function to use the column in subsequent boards.
- FIRST: Returns the first value in a group.
- LAG: Returns the value that is arg1 rows before the current row in column arg0, and null if there are less than arg1 rows before the current row. For example, an arg1 of one will return the previous row at any given point in the window partition (note: arg1 must be an integer literal).
- LAST: Returns the last value in a group.
- LEAD: Returns the value that is arg1 rows after the current row in column arg0, and null if there are less than arg1 rows after the current row. For example, an arg1 of one will return the next row at any given point in the window partition (note: arg1 must be an integer literal).
- NTILE: Returns the ntile group id (from 1 to arg0 inclusive) in an ordered window partition. For example, if arg0 is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4 (note: arg0 must be an integer literal).
- PERCENT_RANK: Returns the relative rank (i.e. percentile) of rows within a window partition.
- RANK: Returns the rank of rows within a window partition.
- ROW_NUMBER: Returns a sequential number starting at 1 within a window partition.
Aggregate functions¶
Aggregate functions can be used in aggregation expressions and window functions.
- ARRAY_AGG: Returns an array of aggregated values from the input column.
- ARRAY_AGG_DISTINCT: Returns an array of distinct aggregated values from the input column.
- AVG: Returns the average of the values in a group.
- COUNT: Returns the number of items in a group.
- COUNT_DISTINCT: Returns the number of items in a group (note: can only be used with the Aggregate option in the expression board. Cannot be used with Add new column, Filter or Replace Column).
- CORR: Returns the Pearson Correlation Coefficient for two columns.
- MAX: Returns the maximum value of the expression in a group.
- MEAN: Returns the average of the values in a group.
- MIN: Returns the minimum value of the expression in a group.
- STDDEV: Returns the sample standard deviation of the expression in a group.
- SUM: Returns the sum of all values in the expression.
- SUM_DISTINCT: Returns the sum of distinct values in the expression (note: cannot be used in window functions).
- VARIANCE: Returns the unbiased variance of the values in a group.
Misc functions¶
- COALESCE: Returns the first column that is not null, or null if all inputs are null.
- GREATEST: Returns the greatest value of the list of values, skipping null values.
- HASH: Calculates the hash code of given columns, and returns the result as an int column.
- ISNULL: Return true if, and only if, the column is null.
- LEAST: Returns the least value of the list of values, skipping null values.
- MD5: Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
- MONOTONICALLY_INCREASING_ID: Returns a monotonically increasing ID. These values may change each time the path is calculated.
- SHA1: Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
- GET_JSON_OBJECT: Extracts a JSON object from the JSON string based on the JSON path ↗ specified, and returns a JSON string of the extracted JSON object. Some examples of this are:
$.field$['field']
中文翻译¶
语法与支持的函数¶
如需了解表达式面板的基本操作,请参阅表达式面板使用指南。本文档是使用 Contour 丰富表达式语言的参考资料,涵盖类型、操作和函数。
数据类型¶
表达式中可使用以下数据类型:
字符串¶
字符串使用单引号表示:
'这是一个字符串'
这是一个字符串
字符串中的单引号需通过添加第二个单引号进行转义:
'I''m a string'
I'm a string
使用 || 运算符将列作为字符串进行拼接:
"Gender" || "MaritalStatus"
也可以将列与常量进行拼接:
'$' || ("salaryColumn"/1000.0) || 'k'
整数¶
整数是非浮点数:
5
双精度浮点数¶
双精度浮点数是浮点数:
5.7
布尔值¶
布尔变量可以是 true 或 false。布尔变量不区分大小写。
日期¶
可以将格式为 YYYY-MM 或 YYYY-MM-DD 的字符串转换为日期:
CAST('2016-12-06' AS DATE)
也可以将字符串转换为时间戳:
CAST('2016-12-06 01:12:34' AS TIMESTAMP)
对时间戳执行操作时,最简便的方法是将时间戳先转换为长整型,如果需要毫秒精度则转换为双精度浮点数。这样可以得到 UNIX 时间中的秒数。
例如,假设有两个时间戳列:"start" 和 "end"。要计算两者之间的分钟数,可以使用以下表达式:
(CAST("end" as LONG) - CAST("start" as LONG)) / 60
将长整型转换为时间戳时,假定长整型值以秒为单位。如果数据以毫秒为单位,请在转换前除以 1000。
操作¶
以下各节概述了表达式中可用的操作及其优先级:
算术运算¶
对数值列(整数或双精度浮点数)与其他数值列或常量进行加、减、乘、除运算:+、-、*、/
"tipAmount" / "fare""diameter" * 3.14
比较运算¶
使用 >(大于)、>=(大于等于)、<(小于)、<=(小于等于)比较任意类型的列与列或列与常量。结果返回布尔值列。
"age" > "averageAge""totalDistance" < 5
相等比较¶
使用 == 检查两列中的值是否相同,使用 != 检查是否不同。
"cityOfBirth" == "cityOfResidence"
也可以将列值与常量进行比较。
"Gender" == 'M'
:::callout{theme="neutral" title="字符串比较"} 比较两个字符串时,使用字典序。请注意,当比较包含数字的字符串时,结果可能不符合直觉。例如:
'Alligator' < 'Boat'=> TRUE,因为 'A' 的编码小于 'B''Alliance' < 'Alligator'=> TRUE,因为 'a' 小于 'g''Zoo' < 'alpha'=> TRUE,因为 'Z' 小于 'a''Zoo' < 'Zoologist'=> TRUE,因为如果一个字符串是另一个字符串的前缀,则较短的字符串被视为较小'1' < '2'=> TRUE,因为字符串字符 '1' 小于字符串字符 '2''10' < '2'=> TRUE,因为其第一个字符串字符 '1' 小于字符串字符 '2';请注意,这是对字符串 '10' 和 '2' 的比较,而非整数 10 和 2'10' < '20'=> TRUE,因为 '1' 小于 '2' :::
:::callout{theme="neutral" title="与 Spark SQL 的行为差异"}
在 Contour 表达式语言中,与 NULL 的相等比较具有以下行为:
NULL = 'DATA'=> FALSENULL != 'DATA'=> TRUENULL = NULL=> TRUENULL != NULL=> FALSE
然而,在 Spark SQL 中,所有这些表达式都返回 NULL。在 Contour 表达式和 Spark SQL 之间进行转换时,必须考虑这种行为差异,以产生一致的结果。
:::
布尔比较¶
通过使用布尔逻辑计算一个或多个列来派生新的布尔值列。
"Age" >=70 AND "Gender" = 'M'
类型转换¶
类型转换允许更改列或表达式的类型。可以转换为布尔值、整数、双精度浮点数、日期和时间戳。
CAST("startDate" AS DATE)CAST("startTime" AS TIMESTAMP)CAST("numParticipants" AS INTEGER)
Case/When/End¶
Case 语句允许在一个表达式中评估多种可能性。每个语句按顺序评估,对于第一个评估为 true 的语句,执行其 THEN 子句。
CASE
WHEN "age" < 13 THEN '儿童'
WHEN "age" > 19 THEN '成人'
ELSE '青少年'
END
CASE MaritalStatus
WHEN 'S' THEN '单身'
WHEN 'M' THEN '已婚'
ELSE '未知'
END
LIKE 匹配¶
在列中搜索指定模式。更多信息请参阅 SQL LIKE ↗ 文档。
空值检查¶
检查值 IS NULL 或 IS NOT NULL。返回布尔值。
优先级¶
表达式中操作的优先级从高到低如下所示。同一行的操作具有相同的优先级。
- +、-、NOT(一元)
- ||(二元)
- *、/、%(二元)
- +、-(二元)
- <、<=、>、>=(二元)
- \=、==、!=、<>、IS、IS NOT、LIKE、RLIKE(二元)
- IS NULL、NOT NULL(一元)
- IN、NOT IN
- AND(二元)
- OR(二元)
函数¶
以下是列表达式中可用函数的完整列表。
数学与数值函数¶
- ABS:计算绝对值。
- CBRT:计算给定值的立方根。
- CEIL:计算给定值的向上取整值。
- COS:计算给定值的余弦值。
- EXP:计算给定值的指数值。
- FACTORIAL:计算给定值的阶乘。
- FLOOR:计算给定值的向下取整值。
- FORMAT_NUMBER:将数值列 arg0 格式化为 '#,###,###.##' 等格式,四舍五入到 arg1 位小数,并将结果作为字符串列返回(注意:arg1 必须为整数常量)。
- ISNAN:当且仅当列为 NaN 时返回 true。
- LN:计算给定值的自然对数。
- LOG:计算以 arg0 为底 arg1 的对数(注意:arg0 必须为数值常量)。
- POW:返回 arg0 的 arg1 次幂。
- RAND:根据提供的种子 arg0,为每行返回一个 0 到 1 之间的随机数。
- ROUND:返回 arg0 四舍五入到 arg1 位小数的值(注意:arg1 必须为整数常量)。
- RTRIM:去除指定字符串值右侧的空格。
- SIN:计算给定列的正弦值。
- SQRT:计算指定浮点值的平方根。
- TAN:计算给定值的正切值。
字符串函数¶
- CONCAT:将多个输入字符串列拼接成一个字符串列。
- CONCAT_WS:使用给定的分隔符将多个输入字符串列拼接成一个字符串列(注意:arg0 必须为字符串常量)。
- FORMAT_STRING:根据 Java Formatter ↗ 格式化参数,并将结果作为字符串列返回。
- LENGTH:计算给定字符串或二进制列的长度。
- LEVENSHTEIN:计算两个给定字符串列的 Levenshtein 距离。
- LOWER:将字符串列转换为小写。
- LPAD:返回字符串列 arg0,左侧填充到长度 arg1,使用字符串 arg2 填充(注意:arg1 必须为整数常量,arg2 必须为字符串常量)。
- LTRIM:去除指定字符串值左侧的空格。
- REGEXP_EXTRACT:将字符串列 arg0 与正则表达式 arg1 匹配,并提取特定分组 arg2(从 1 开始索引)。如果正则表达式不匹配或指定分组不匹配,则返回空字符串(注意:arg1 必须为字符串常量,arg2 必须为整数常量)。
- REGEXP_REPLACE:将 arg0 中所有匹配 arg1 的子字符串替换为 arg2。
- REVERSE:反转字符串列并作为新的字符串列返回。
- RPAD:返回字符串列 arg0,右侧填充到长度 arg1,使用字符串 arg2 填充(注意:arg1 必须为整数常量,arg2 必须为字符串常量)。
- SPLIT:根据正则表达式字符串 arg1 拆分字符串列 arg0(注意:arg1 必须为字符串常量)。
- SUBSTRING:arg0 的子字符串,从索引 arg1 开始(从 1 开始索引),长度为 arg2(注意:arg1 和 arg2 必须为整数常量)。
- TRIM:去除指定字符串列两侧的空格。
- UPPER:将字符串列转换为大写。
日期/时间函数¶
有关派生相对日期的更多信息和示例,请参阅派生相对日期的参考文档。
- ADD_MONTHS:返回 arg0 之后 arg1 个月的日期(注意:arg1 必须为整数常量)。
- CURRENT_DATE:以日期列形式返回当前日期。这些值仅在路径重新计算时更新。为避免各面板间值不一致,建议每日更新路径。
- CURRENT_TIMESTAMP:以时间戳列形式返回当前时间戳。这些值仅在路径重新计算时更新。
- CURRENT_UNIX_TIMESTAMP:以秒数形式返回当前时间戳。这些值仅在路径重新计算时更新。
- DATE_ADD:返回 arg0 之后 arg1 天的日期(注意:arg1 必须为整数常量)。
- DATE_FORMAT:将日期/时间戳 arg0 转换为 arg1 指定格式的字符串(注意:arg1 必须为字符串常量)。
- DATE_SUB:返回 arg0 之前 arg1 天的日期,根据 Java SimpleDateFormat ↗(注意:arg1 必须为整数常量)。
- DATE_TRUNC:返回日期/时间戳 arg1,截断到格式 arg0 指定的单位。
- DATE_DIFF:返回从 arg1 到 arg0 的天数。
- DAY_OF_MONTH:从给定的日期/时间戳/字符串中提取月份中的第几天,以整数形式返回。
- DAY_OF_WEEK:从给定的日期/时间戳/字符串中提取星期几,以整数形式返回。范围从 1(星期日)到 7(星期六)。
- DAY_OF_YEAR:从给定的日期/时间戳/字符串中提取年份中的第几天,以整数形式返回。
- FROM_UTC_TIMESTAMP:给定时间戳列 arg0(对应 UTC 中的某个时间点),返回另一个时间戳,该时间戳对应给定时区 arg1 中的同一时间点(注意:arg1 必须为字符串常量)。
- HOUR:从给定的日期/时间戳/字符串中提取小时数,以整数形式返回。
- LAST_DAY:给定日期列,返回该日期所属月份的最后一天。
- MINUTE:从给定的日期/时间戳/字符串中提取分钟数,以整数形式返回。
- MONTH:从给定的日期/时间戳/字符串中提取月份,以整数形式返回。
- MONTHS_BETWEEN:返回日期 arg0 和 arg1 之间的月数。
- QUARTER:从给定的日期/时间戳/字符串中提取季度,以整数形式返回。
- SECOND:从给定的日期/时间戳/字符串中提取秒数,以整数形式返回。
- TO_UNIX_TIMESTAMP:将时间字符串 arg0 与给定模式字符串 arg1 进行转换,失败时返回 null(注意:arg1 必须为字符串常量)。
- TO_UTC_TIMESTAMP:给定时间戳列 arg0(对应给定时区 arg1 中的某个时间点),返回另一个时间戳,该时间戳对应 UTC 中的同一时间点(注意:arg1 必须为字符串常量)。
- WEEK_OF_YEAR:从给定的日期/时间戳/字符串中提取 ISO 周数 ↗,以整数形式返回。
- YEAR:从给定的日期/时间戳/字符串中提取年份,以整数形式返回。
数组函数¶
有关数组函数的更多信息和示例,请参阅数组函数参考文档。
- ARRAY:创建包含所有输入列的数组。
- ARRAY_CONTAINS:如果数组列 arg0 包含值 arg1,则返回 true(注意:arg1 必须为常量)。
- ARRAY_GET_AT_INDEX:从数组列 arg0 中返回索引 arg1 处的元素(从 1 开始索引)(注意:arg1 必须为整数常量)。显式转换此函数的结果以在后续面板中使用该列。
- ARRAY_LENGTH:返回给定数组的长度。当输入为
null时返回 -1。 - ARRAY_JOIN:返回由分隔符字符串 arg1 连接的数组列 arg0 的字符串(注意:arg1 必须为字符串常量)。
- ARRAY_SORT:如果 arg1 为 true,则按升序对输入数组 arg0 进行排序;如果 arg1 为 false,则按降序排序(注意:arg1 必须为布尔常量)。
:::callout{theme="warning" title="警告"}
在窗口函数中使用 ROW_NUMBER、FIRST、LAST、ARRAY_AGG 或 ARRAY_AGG_DISTINCT 时,请注意非确定性。假设我们按列 A 分区并按列 B 排序。如果对于列 A 的相同值,有多行具有相同的列 B 值,则这些窗口函数的结果可能是非确定性的——在相同输入数据和逻辑下可能产生不同结果。
:::
窗口函数¶
窗口函数需要 OVER 子句;更多信息和示例请参阅窗口函数参考文档。
- CUME_DIST:返回窗口分区内值的累积分布,即低于当前行的行所占比例。
- DENSE_RANK:返回窗口分区内行的排名,无间隔。
- EXPLODE:为给定数组列中的每个元素创建新行,删除包含 null 的行。显式转换此函数的结果以在后续面板中使用该列。
- EXPLODE_OUTER:为给定数组列中的每个元素创建新行。显式转换此函数的结果以在后续面板中使用该列。
- FIRST:返回组中的第一个值。
- LAG:返回列 arg0 中当前行之前 arg1 行的值,如果当前行之前的行数少于 arg1,则返回 null。例如,arg1 为 1 将返回窗口分区中任意给定点的前一行(注意:arg1 必须为整数常量)。
- LAST:返回组中的最后一个值。
- LEAD:返回列 arg0 中当前行之后 arg1 行的值,如果当前行之后的行数少于 arg1,则返回 null。例如,arg1 为 1 将返回窗口分区中任意给定点的下一行(注意:arg1 必须为整数常量)。
- NTILE:在有序窗口分区中返回 ntile 组 ID(从 1 到 arg0 包含)。例如,如果 arg0 为 4,则前四分之一的行将获得值 1,第二个四分之一获得 2,第三个四分之一获得 3,最后一个四分之一获得 4(注意:arg0 必须为整数常量)。
- PERCENT_RANK:返回窗口分区内行的相对排名(即百分位数)。
- RANK:返回窗口分区内行的排名。
- ROW_NUMBER:返回窗口分区内从 1 开始的顺序编号。
聚合函数¶
聚合函数可用于聚合表达式和窗口函数。
- ARRAY_AGG:返回输入列中聚合值的数组。
- ARRAY_AGG_DISTINCT:返回输入列中不同聚合值的数组。
- AVG:返回组中值的平均值。
- COUNT:返回组中项目的数量。
- COUNT_DISTINCT:返回组中项目的数量(注意:只能在表达式面板的聚合选项中使用。不能与添加新列、筛选或替换列一起使用)。
- CORR:返回两列的皮尔逊相关系数。
- MAX:返回组中表达式的最大值。
- MEAN:返回组中值的平均值。
- MIN:返回组中表达式的最小值。
- STDDEV:返回组中表达式的样本标准差。
- SUM:返回表达式中所有值的总和。
- SUM_DISTINCT:返回表达式中不同值的总和(注意:不能在窗口函数中使用)。
- VARIANCE:返回组中值的无偏方差。
其他函数¶
- COALESCE:返回第一个非空的列,如果所有输入均为 null,则返回 null。
- GREATEST:返回值列表中的最大值,跳过 null 值。
- HASH:计算给定列的哈希码,并将结果作为整数列返回。
- ISNULL:当且仅当列为 null 时返回 true。
- LEAST:返回值列表中的最小值,跳过 null 值。
- MD5:计算二进制列的 MD5 摘要,并将值作为 32 字符的十六进制字符串返回。
- MONOTONICALLY_INCREASING_ID:返回单调递增的 ID。这些值可能在每次路径计算时发生变化。
- SHA1:计算二进制列的 SHA-1 摘要,并将值作为 40 字符的十六进制字符串返回。
- GET_JSON_OBJECT:根据指定的 JSON 路径 ↗ 从 JSON 字符串中提取 JSON 对象,并返回提取的 JSON 对象的 JSON 字符串。一些示例如下:
$.field$['field']