Streaming resource guide(流式资源指南)¶
This page lists the resources you may need to reference when implementing an end-to-end streaming workflow.
Data Connection supports syncing data from a wide variety of streaming platforms into Foundry streaming datasets, which can then be used in streaming pipelines. Streaming syncs enable data to flow into Foundry with low latency and high throughput to support real time decision-making processes.
There are two ways to sync data from streams into Foundry:
- Data Connection supports pulling records from streaming platforms into Foundry. As with batch syncs, data is read from a stream and synced to Foundry using only unidirectional connections using the agent architecture.
- If desired, Foundry enables pushing records from a stream directly into a Foundry stream via the stream proxy.
Foundry can connect to many sources of streaming data. Sources with dedicated connectors include:
- Apache Kafka
- Amazon Kinesis
- Amazon SQS
- Aveva PI
- Google Pub/Sub
For streaming sources without a dedicated connector, you can connect to them using external transforms. This includes sources such as:
- ActiveMQ
- Amazon SNS
- IBM MQ
- RabbitMQ
- MQTT [Beta]
- Solace
This page lists the resources you may need to reference when implementing an end-to-end streaming workflow.
1. Core concepts¶
We recommend reviewing the following introductory concept page to understand what streams are, how they are stored, and how they are processed.
2. Overview¶
These pages will offer a broader scope of the various points to consider when determining if streaming is right for your use case or when deploying production streams.
- Streaming pipelines overview
- Comparison: Streaming vs. batch
- Performance considerations
- Streaming compute usage
- Streaming profiles
- Stream monitoring
- Streaming keys
- Streaming stateful transforms
- Stream debugging
3. Connect to data sources¶
You will need to complete one of the following workflows to connect your external data sources to Foundry for streaming. We recommend reviewing both options to understand possible benefits and limitations for your use case.
4. Transform your streaming data¶
You can use Pipeline Builder to transform your live data. Outputs of your Pipeline Builder transforms will still be streaming datasets that you can use in real time throughout Foundry.
5. Monitor streaming pipelines [Beta]¶
Set up alerting around your pipeline's health.
6. Development tools¶
Here, you can find tools to improve development of streaming pipelines.
中文翻译¶
流式资源指南¶
本页面列出了在实施端到端流式工作流时可能需要参考的资源。
Data Connection 支持将来自多种流式平台的数据同步到 Foundry 流式数据集(streaming datasets)中,这些数据集随后可用于流式管道。流式同步(Streaming syncs)能够以低延迟和高吞吐量将数据流入 Foundry,从而支持实时决策过程。
有两种方式可将数据从流同步到 Foundry:
- Data Connection 支持从流式平台拉取记录到 Foundry。与批处理同步(batch syncs)类似,数据从流中读取并通过代理架构仅使用单向连接同步到 Foundry。
- 如有需要,Foundry 支持通过流代理(stream proxy)将记录直接从流推送到 Foundry 流中。
Foundry 可以连接到多种流式数据源。具有专用连接器的数据源包括:
- Apache Kafka
- Amazon Kinesis
- Amazon SQS
- Aveva PI
- Google Pub/Sub
对于没有专用连接器的流式数据源,您可以使用外部转换(external transforms)进行连接。此类数据源包括:
- ActiveMQ
- Amazon SNS
- IBM MQ
- RabbitMQ
- MQTT [Beta]
- Solace
本页面列出了在实施端到端流式工作流时可能需要参考的资源。
1. 核心概念¶
我们建议查看以下介绍性概念页面,以了解什么是流、它们如何存储以及如何处理。
2. 概述¶
以下页面将从更广泛的视角介绍在确定流式是否适合您的用例或部署生产流时需要考虑的各种要点。
- 流式管道概述(Streaming pipelines overview)
- 对比:流式与批处理(Comparison: Streaming vs. batch)
- 性能考量(Performance considerations)
- 流式计算用量(Streaming compute usage)
- 流式配置文件(Streaming profiles)
- 流监控(Stream monitoring)
- 流式键(Streaming keys)
- 流式有状态转换(Streaming stateful transforms)
- 流调试(Stream debugging)
3. 连接数据源¶
您需要完成以下工作流之一,将外部数据源连接到 Foundry 以实现流式处理。我们建议查看这两个选项,以了解对您的用例可能存在的优势和限制。
4. 转换流式数据¶
您可以使用管道构建器(Pipeline Builder)来转换实时数据。管道构建器转换的输出仍然是流式数据集,您可以在整个 Foundry 中实时使用。
- 使用管道构建器创建流式管道(Create a streaming pipeline with Pipeline Builder)
- 将流与本体论集成(Integrate your stream with the Ontology)
5. 监控流式管道 [Beta]¶
围绕管道的健康状况设置告警。
6. 开发工具¶
在此处,您可以找到用于改进流式管道开发的工具。