Pipeline management(管道管理)¶
This page outlines features and best practices for pipeline management in Pipeline Builder.
Reusing logic across a pipeline¶
Pipeline Builder supports reusing logic across a pipeline via parameters and custom functions. Parameters are values that can be used in multiple transforms in a pipeline. Custom functions are a series of transforms centrally defined as a single transform.
Large pipeline management¶
Pipeline Builder supports grouping and optimization features to help manage large pipelines.
You can create folders and sub-folders in Pipeline Builder to group nodes. This allows you to organize nodes and toggle the visibility of nodes in a subset of folders to narrow the scope of your pipeline.
You can use node color groups in Pipeline Builder to collapse nodes of the same color and improve the readability of your graph.
You can focus on a subsection of your graph by showing and hiding nodes. You can choose these nodes manually or show and hide them based on color groupings.
Job grouping in Pipeline Builder allows you to control how your outputs are split into jobs, and compute profiles for each job.
When building pipelines, you can mark transform nodes that are shared between multiple outputs as checkpoints. These intermediate results will be computed only once during your next build, which can save compute.
For faster previews, you can add input sampling to downsample your input data as you are prototyping your pipeline. Pipeline deploys will still run on the full dataset.
中文翻译¶
管道管理¶
本页概述了 Pipeline Builder 中管道管理的功能与最佳实践。
跨管道复用逻辑¶
Pipeline Builder 支持通过参数和自定义函数跨管道复用逻辑。参数(Parameter)是可在管道中多个转换(Transform)中使用的值。自定义函数(Custom Function)则是一系列集中定义为单个转换的转换操作。
大型管道管理¶
Pipeline Builder 提供分组与优化功能,帮助管理大型管道。
您可以在 Pipeline Builder 中创建文件夹和子文件夹来对节点进行分组。这使您能够组织节点,并通过切换特定文件夹子集中节点的可见性来缩小管道的作用范围。
您可以在 Pipeline Builder 中使用节点颜色分组来折叠相同颜色的节点,从而提高图形可读性。
您可以通过显示和隐藏节点来聚焦图形的某个子部分。您可以手动选择这些节点,或根据颜色分组进行显示和隐藏。
Pipeline Builder 中的作业分组允许您控制输出如何拆分为作业(Job),并为每个作业计算配置文件(Compute Profile)。
在构建管道时,您可以将多个输出之间共享的转换节点标记为检查点。这些中间结果将在下次构建时仅计算一次,从而节省计算资源。
为加快预览速度,您可以在原型设计管道时添加输入采样来对输入数据进行降采样。管道部署仍将在完整数据集上运行。