Streaming pipelines(流式管道)¶
Streaming pipelines provide the ability to make immediate critical decisions based on real-time data. By processing data as a stream with dedicated compute, streaming pipelines are able to process records with very low latency. On average, streaming data can be accessible in the Ontology and available for analysis in time series applications, such as Quiver or Foundry Rules, in under 15 seconds. To achieve this low-latency, streams are built on top of compute that runs continuously and require different architecture and maintenance consideration compared to batch pipelines.
Best practices¶
When building out streaming pipelines, consider these factors:
- Streams often power highly operational workflows and require careful planning around downtime, maintenance, and logic changes to ensure high uptime and availability.
- Compute for streaming runs continuously. This can result in higher compute costs than a periodic batch job. Similarly to batch pipelines, consider starting with the smallest profile available and adjust that if the scale of your data requires it.
- Streams operate on a per-row basis and have constraints on the maximum row size to ensure low latency data transfers. The constraint is set to 1mb per individual row.
- Streams using state (windows or aggregations, for example) require design consideration to ensure the state is not broken when changing the stream logic.
Get started¶
To start using streaming pipelines in Foundry, review how to create a simple streaming pipeline, and learn about streaming transforms in Pipeline Builder. If you want to learn about connecting your data sources to Foundry, review how to push data into a stream, or how to setup a streaming sync.
中文翻译¶
流式管道¶
流式管道能够基于实时数据做出即时关键决策。通过使用专用计算资源以流式方式处理数据,流式管道能够以极低延迟处理记录。平均而言,流式数据可在15秒内接入本体论(Ontology),并可用于时序应用(如Quiver或Foundry Rules)中的分析。为实现这种低延迟,流式管道构建于持续运行的计算资源之上,与批处理管道相比,需要不同的架构设计和维护考量。
最佳实践¶
在构建流式管道时,请考虑以下因素:
- 流式管道通常支撑高操作性的工作流,需要围绕停机、维护和逻辑变更进行周密规划,以确保高可用性和持续运行。
- 流式计算持续运行,可能导致比周期性批处理作业更高的计算成本。与批处理管道类似,建议从最小配置开始,并根据数据规模需求进行调整。
- 流式管道按行处理,并对单行数据大小设有约束(最大1MB),以确保低延迟数据传输。
- 使用状态(例如窗口或聚合)的流式管道需要进行设计考量,以确保在变更流式逻辑时状态不会中断。
快速入门¶
要开始在Foundry中使用流式管道,请查看如何创建简单的流式管道,并了解Pipeline Builder中的流式转换。如果您想了解如何将数据源连接到Foundry,请查看如何将数据推送到流中,或如何设置流式同步。