Time bounded drop out of order(超时乱序丢弃(Time bounded drop out of order))¶
Supported in: Streaming
Drops rows with the same values for all key columns that are out of order. A row is out of order if it would have come before an already received row with the same key values based on sort columns and directions. Two rows are compared by evaluating the first sort column and direction first, and then moving on to the next sort column and direction if and only if there was a tie, and so on until order is determined or all sort columns are tied in which case the rows are equal. The current maximum for each key is stored until no new rows have been seen for that key for an event time greater than or equal to the expiry. After a key has received no new rows for greater or equal to the expiry time, any new row for that key will be never be dropped, and will always be stored as the new current maximum.
Transform categories: Other
Declared arguments¶
- Dataset: Dataset to drop out of order rows.
Table - Key expiration time unit: Unit for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
Enum\ - Key expiration time value: Value for amount of time to store the greatest record for a given key. If state is stored for a key, and a different key is processed with a watermark greater than this expiration period, then state is expired for the key and any new records of the same key will not be dropped. For any key, a new record pushes the expiry to this amount of time in the future, whether or not it has the highest order precedence.
Literal\ - Ordering guarantee: Specify a column and direction that will define the order that stream elements must follow in order to not be dropped. Ascending order guarantees that incoming stream elements must be equal or increasing in value compared to all previous rows to avoid being dropped; descending order guarantees the opposite. You can specify multiple columns and directions, but columns and directions beyond the first column are only considered in the event of a tie and are consulted in order. They will not apply otherwise.
List\, Enum\\ >> - optional Key by columns: Columns used to partition the input by key. Rows sharing the same key column values are processed in the order they are received. The order in which rows with the same key columns are processed may differ from the order defined by the sort spec. A row is considered out of order when it ought to be placed before the state stored highest precedence already processed row with the same key, based on the sort spec. For such out-of-order rows, they are dropped during the process so long as such state for this key exists and has not expired.
Set\>
中文翻译¶
超时乱序丢弃(Time bounded drop out of order)¶
支持:流式处理(Streaming)
丢弃所有键列(key columns)值相同但处于乱序的行。如果某行基于排序列(sort columns)和排序方向(sort directions)本应出现在已接收的具有相同键值的行之前,则该行被视为乱序。两行比较时,首先评估第一个排序列及其方向,若出现平局则继续评估下一个排序列及其方向,依此类推,直至确定顺序或所有排序列均平局(此时两行相等)。每个键的当前最大值会被存储,直到该键在事件时间(event time)大于或等于过期时间(expiry)后未收到新行。当某个键在超过或等于过期时间后仍未收到新行时,该键的任何新行将永远不会被丢弃,并始终作为新的当前最大值存储。
转换类别:其他
声明参数¶
- 数据集(Dataset): 需要丢弃乱序行的数据集。
表(Table) - 键过期时间单位(Key expiration time unit): 存储给定键最大记录的时间单位。若某个键的状态被存储,且处理另一个键时其水印(watermark)超过此过期时间,则该键的状态将过期,该键的任何新记录不会被丢弃。对于任何键,新记录会将过期时间推迟至当前时间加上此时间值,无论其是否具有最高顺序优先级。
枚举\<天(Days)、小时(Hours)、毫秒(Milliseconds)、分钟(Minutes)、秒(Seconds)、周(Weeks)> - 键过期时间值(Key expiration time value): 存储给定键最大记录的时间值。若某个键的状态被存储,且处理另一个键时其水印超过此过期时间,则该键的状态将过期,该键的任何新记录不会被丢弃。对于任何键,新记录会将过期时间推迟至当前时间加上此时间值,无论其是否具有最高顺序优先级。
字面量\ - 顺序保证(Ordering guarantee): 指定定义流元素必须遵循的顺序(以避免被丢弃)的列和方向。升序(Ascending)保证传入的流元素值必须大于或等于所有先前行的值,否则将被丢弃;降序(Descending)则相反。可指定多个列和方向,但超出第一列的其他列和方向仅在出现平局时按顺序参考,否则不生效。
列表\<元组\<列\, 枚举\<升序(Ascending)、降序(Descending)>>> - 可选 键列(Key by columns): 用于按键对输入进行分区的列。共享相同键列值的行按接收顺序处理。具有相同键列的行处理顺序可能与排序规范(sort spec)定义的顺序不同。当某行基于排序规范应置于已处理的具有相同键的最高优先级状态之前时,该行被视为乱序。对于此类乱序行,只要该键的状态存在且未过期,它们将在处理过程中被丢弃。
集合\<列\<二进制(Binary)| 布尔(Boolean)| 字节(Byte)| 双精度浮点数(Double)| 浮点数(Float)| 整数(Integer)| 长整数(Long)| 短整数(Short)| 字符串(String)| 时间戳(Timestamp)>>