Assign timestamps and watermarks(分配时间戳和水印(Assign timestamps and watermarks))¶
Supported in: Streaming
Assigns timestamps and watermarks to the input, filtering out records where the timestamp is null.
Transform categories: Other
Declared arguments¶
- Dataset: Dataset to assign timestamps and watermarks.
Table - Timestamp expression: Expression evaluating to timestamp to assign.
Expression\ - optional Emit watermark on every record: If true, the watermark will be propagated for every record in the stream. This is generally inefficient and will add performance overhead but gives deterministic behaviour if the stream is replayed. It is recommended to set this value to false in most cases.
Literal\ - optional Idleness timeout unit: Unit for the duration of time to consider a subtask idle.
Enum\ - optional Idleness timeout value: Value for the duration of time after which a subtask not receiving records will be considered idle, at which point it will not prevent the global watermark from progressing. This can help address unbounded state growth for time-based transforms or hanging (non-emitting) windows, but late records from slow subtasks may be dropped unexpectedly for downstream operators with allowed lateness. Please check Foundry Streaming docs to understand event-time handling.
Literal\
Examples¶
Example 1: Base case¶
Argument values:
- Dataset: ri.foundry.main.dataset.a
- Timestamp expression:
timestamp - Emit watermark on every record: null
- Idleness timeout unit: null
- Idleness timeout value: null
Input:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| 1969-12-31T23:59:40Z | 30 | sensor_2 |
| 1969-12-31T23:59:35Z | 29 | sensor_1 |
Output:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| 1969-12-31T23:59:40Z | 30 | sensor_2 |
| 1969-12-31T23:59:35Z | 29 | sensor_1 |
Example 2: Null case¶
Argument values:
- Dataset: ri.foundry.main.dataset.a
- Timestamp expression:
timestamp - Emit watermark on every record: null
- Idleness timeout unit: null
- Idleness timeout value: null
Input:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| null | 30 | sensor_2 |
| null | 29 | sensor_1 |
Output:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
中文翻译¶
分配时间戳和水印(Assign timestamps and watermarks)¶
支持:流式处理(Streaming)
为输入数据分配时间戳和水印,过滤掉时间戳为空的记录。
转换类别:其他
声明的参数¶
- 数据集(Dataset): 需要分配时间戳和水印的数据集。
表(Table) - 时间戳表达式(Timestamp expression): 用于计算待分配时间戳的表达式。
表达式\ - 可选 每条记录都发出水印(Emit watermark on every record): 如果为true,则流中的每条记录都会传播水印。这通常效率较低且会增加性能开销,但在重放流时能提供确定性行为。大多数情况下建议将此值设为false。
字面量\ - 可选 空闲超时单位(Idleness timeout unit): 判断子任务(subtask)是否空闲的时间单位。
枚举\ - 可选 空闲超时值(Idleness timeout value): 子任务未收到记录后被视为空闲的时间值。一旦进入空闲状态,该子任务将不再阻止全局水印(global watermark)的推进。这有助于解决基于时间的转换中无限制的状态增长或窗口挂起(不发射)的问题,但对于下游允许延迟(late)记录的操作符,来自慢速子任务的延迟记录可能会被意外丢弃。请查阅Foundry流式处理文档以了解事件时间(event-time)处理机制。
字面量\
示例¶
示例1:基本情况¶
参数值:
- 数据集: ri.foundry.main.dataset.a
- 时间戳表达式:
timestamp - 每条记录都发出水印: null
- 空闲超时单位: null
- 空闲超时值: null
输入:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| 1969-12-31T23:59:40Z | 30 | sensor_2 |
| 1969-12-31T23:59:35Z | 29 | sensor_1 |
输出:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| 1969-12-31T23:59:40Z | 30 | sensor_2 |
| 1969-12-31T23:59:35Z | 29 | sensor_1 |
示例2:空值情况¶
参数值:
- 数据集: ri.foundry.main.dataset.a
- 时间戳表达式:
timestamp - 每条记录都发出水印: null
- 空闲超时单位: null
- 空闲超时值: null
输入:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |
| null | 30 | sensor_2 |
| null | 29 | sensor_1 |
输出:
| timestamp | temperature | sensor_id |
|---|---|---|
| 1969-12-31T23:59:50Z | 28 | sensor_1 |