跳转至

Performance considerations(性能考量)

As you prepare to create a stream in Foundry, it is important to consider the latency and throughput expectations that define your stream. This page will present some questions to consider regarding both latency and throughput performance for your stream use case.

Latency

Latency is the speed at which stream records are processed. Latency is a core performance component that defines realtime streams, and the speed expectations from when your records process through a stream and arrive at their destination can have real-world impact. For example, stream latency determines how quickly alerts are triggered for airline flight delays or supply chain issues that require immediate action. The factors that impact latency are multi-faceted, but some of the most significant considerations are listed below.

Latency factors

  • How fast is the source producing data?
  • Foundry can only consume data as fast as it is produced, so you should ensure the data source is able to produce data quickly.
  • How long does it take for the data to cross network boundaries?
  • When ingesting data into Foundry, the data often must pass across network boundaries which can introduce latency depending on network configuration, firewalls, and other factors.
  • How many stages are in your end-to-end pipeline?
  • Foundry streaming will co-locate pipeline transformations defined in the same Code Repository or Pipeline Builder graph onto the same physical hardware to automatically optimize latencies. When more stages are added to the pipeline (for example, multiple repositories or Builder pipelines are chained together) we are unable to perform the same optimizations, incurring additional latency.
  • Is data being sent to external systems?
  • For a record to be accessible in a low latency manner, the destination system must be able to process data in a low latency manner. Foundry offers optimized destinations, such as time series and the Ontology, but if data is traveling to external systems that system must support the latency requirements.
  • What is the consistency model of your data?
  • Data consistency plays a significant role in end-to-end latency. Data that requires exactly-once guarantees (e.g. your record is guaranteed that it will be processed exactly one time) adds overhead and latency to ensure atomicity of your pipeline. If, however, your pipeline can run with at-least-once guarantees (e.g. each record is guaranteed to be processed at least one time, but may also be processed more than once), the system will automatically optimize your pipelines to make them run faster.

End-to-end latency of a streaming pipeline

A standard streaming pipeline can run through the following stages in under 15 seconds:

  • Ingestion: ~1-2 seconds
  • Transformation: ~5 seconds if exactly-once is enabled (default), or 1 second if at-least-once is enabled
  • Syncing into a backing datastore (object storage sync or time series sync): ~5 seconds if exactly once is enabled (default), or 1 second if at-least-once is enabled

As indicated above, there are three major factors that influence the end-to-end latency of a streaming pipeline:

  1. The complexity of the pipeline based on the number of transforms.
  2. The consistency model based on whether the pipeline is running at-least-once or exactly-once mode.
  3. Time-based windows and aggregations in transforms. For example, if you specify that you want to aggregate over a 30-second window, then the data will implicitly have 30 extra seconds of latency for the aggregation.

Throughput

Throughput is the amount of records that can be processed over a period of time. Throughput is often equally as important as latency for measuring the performance of a low latency pipeline, and some of the most significant considerations are listed below:

  • How many records is your source producing per time interval?
  • Foundry’s streaming capabilities come with high throughput levels out of the box. However, if your source stream is producing at an exceptionally high rate, you can:
    • Increase the number of partitions your stream uses to support those higher volumes.
    • Set an existing stream's stream type to "HIGH THROUGHPUT". This configuration increases the number of records your source sends in one batch and is recommended if you notice that your stream's "Total Lag" metric is greater than 0. Note that this setting directly trades off latency for throughput. Before proceeding, check your stream's "Total Throughput" metric to confirm that applying "HIGH THROUGHPUT" is the right choice for your stream.
  • How CPU intensive is the processing portion of your pipeline?
  • Throughput can often be limited by delays in processing. There are many ways to increase throughput in processing, most of which can be solved by scaling the size of your processing cluster.
  • How fast can your destination system receive records?
  • The stream destination system can also cause a delay if it cannot receive records as fast as they are produced. This can lead to backpressure ↗, which decreases throughput and increases latency. Foundry’s streaming products are optimized and designed to keep up with extremely high throughputs. However, if you set a streaming pipeline to write to your own destination, you should ensure the destination can keep up with the volume of records produced.

Advanced

  • For users with an in-depth understanding of their data pipeline, another potential bottleneck for stream performance is network bandwidth. Symptoms of suboptimal network bandwidth include non-zero lag, lower than expected throughput, and records being dropped. To alleviate these symptoms, you can apply data compression to your stream. However, before doing so, keep in mind that:
  • Data compression works best on high volume streams with repetitive strings.
  • For lower volume streams whose primary concern is reducing latency, enabling data compression will further increase latency due to time spent compressing data in a suboptimal manner (for example, if there is a low volume of unique strings).

中文翻译

性能考量

在准备于 Foundry 中创建流时,必须考虑定义该流的延迟和吞吐量预期。本页将提出一些关于流使用场景中延迟和吞吐量性能的问题供您考量。

延迟

延迟是指流记录被处理的速度。延迟是定义实时流的核心性能指标,记录通过流处理并到达目的地所需的速度可能会产生实际影响。例如,流延迟决定了针对需要立即处理的航班延误或供应链问题触发警报的速度。影响延迟的因素是多方面的,但以下列出了一些最重要的考量因素。

延迟因素

  • 数据源生成数据的速度有多快?
  • Foundry 只能以数据生成的速度进行消费,因此您应确保数据源能够快速生成数据。
  • 数据跨越网络边界需要多长时间?
  • 将数据摄取到 Foundry 时,数据通常必须跨越网络边界,这可能会因网络配置、防火墙和其他因素而引入延迟。
  • 端到端流水线中有多少个阶段?
  • Foundry 流式处理会将同一代码仓库或流水线构建器图中定义的流水线转换协同定位到同一物理硬件上,以自动优化延迟。当流水线中添加更多阶段时(例如,多个仓库或构建器流水线被串联在一起),我们无法执行相同的优化,从而产生额外的延迟。
  • 数据是否被发送到外部系统?
  • 要使记录能够以低延迟方式访问,目标系统必须能够以低延迟方式处理数据。Foundry 提供了优化的目标,例如时间序列和本体论(Ontology),但如果数据被传输到外部系统,则该系统必须支持延迟要求。
  • 数据的一致性模型是什么?
  • 数据一致性在端到端延迟中起着重要作用。需要精确一次(exactly-once)保证的数据(例如,保证您的记录将被精确处理一次)会增加开销和延迟,以确保流水线的原子性。然而,如果您的流水线可以以至少一次(at-least-once)保证运行(例如,每条记录保证至少被处理一次,但也可能被处理多次),系统将自动优化您的流水线以使其运行更快。

流式流水线的端到端延迟

标准流式流水线可以在 15 秒内完成以下阶段:

  • 摄取: 约 1-2 秒
  • 转换: 如果启用精确一次(默认),约 5 秒;如果启用至少一次,约 1 秒
  • 同步到后端数据存储(对象存储同步或时间序列同步): 如果启用精确一次(默认),约 5 秒;如果启用至少一次,约 1 秒

如上所述,影响流式流水线端到端延迟的主要因素有三个:

  1. 基于转换数量的流水线复杂度。
  2. 基于流水线是以至少一次还是精确一次模式运行的一致性模型。
  3. 转换中的基于时间的窗口和聚合。例如,如果您指定要聚合 30 秒窗口内的数据,则数据将隐式增加 30 秒的聚合延迟。

吞吐量

吞吐量是指在一段时间内可以处理的记录数量。在衡量低延迟流水线的性能时,吞吐量通常与延迟同等重要,以下列出了一些最重要的考量因素:

  • 您的数据源在每个时间间隔内生成多少条记录?
  • Foundry 的流式处理功能开箱即用即可提供高吞吐量水平。但是,如果您的源流以异常高的速率生成数据,您可以:
    • 增加流使用的分区数量以支持更高的数据量。
    • 将现有流的流类型设置为"HIGH THROUGHPUT"。此配置会增加源在单个批次中发送的记录数量,如果您发现流的"总延迟"指标大于 0,建议使用此设置。请注意,此设置直接以延迟换取吞吐量。在继续操作之前,请检查流的"总吞吐量"指标,以确认应用"HIGH THROUGHPUT"是适合您流的选择。
  • 流水线处理部分的 CPU 密集程度如何?
  • 吞吐量通常可能受到处理延迟的限制。有许多方法可以提高处理吞吐量,其中大多数可以通过扩展处理集群的大小来解决。
  • 目标系统接收记录的速度有多快?
  • 如果流目标系统无法以记录生成的速度接收记录,也可能导致延迟。这可能导致背压 ↗,从而降低吞吐量并增加延迟。Foundry 的流式处理产品经过优化和设计,能够跟上极高的吞吐量。但是,如果您将流式流水线设置为写入自己的目标,则应确保目标能够跟上生成的记录量。

高级

  • 对于对其数据流水线有深入了解的用户,流性能的另一个潜在瓶颈是网络带宽。网络带宽不佳的症状包括非零延迟、吞吐量低于预期以及记录丢失。为缓解这些症状,您可以对流应用数据压缩。但在执行此操作之前,请记住:
  • 数据压缩最适合具有重复字符串的高容量流。
  • 对于主要关注减少延迟的低容量流,启用数据压缩会因以次优方式压缩数据(例如,唯一字符串数量较少)而进一步增加延迟。