data integration overview

Data connectivity and integration（数据连接与集成）¶

Foundry provides a highly configurable set of data connectivity and integration tools that extend far beyond typical extract-transform-load (ETL) or extract-load-transform (ELT) solutions. Foundry is designed to reduce the cost of data integration over time through a rich suite of capabilities that act as a force multiplier for data teams. While commodity cloud services provide storage and compute for basic pipelines and experimentation, many additional layers of capability are required to manage, deliver, and validate datasets for critical operations. Foundry is designed to serve as the data integration backbone for the most complex environments in the world.

Connecting data¶

This begins with an extensible data connection framework that establishes connections with all types of source systems - structured, unstructured, or semi-structured – and with all key data transfer approaches, such as batch, micro-batch, or streaming. This functionality is integrated with the platform’s data transformation and data management functionality, which includes full lineage of data versions, granular security for collaborative management of data extraction, and branching of data sync configurations.

Learn more about connecting to data in Foundry.

Data transformation¶

For data transformation, Foundry provides an extensible, scalable build system for data which leverages multimodal compute to produce output datasets. Foundry’s compute-agnostic “Build” framework provides fully integrated security and data lineage, and enables mixing-and-matching of third-party compute runtimes. Foundry also includes an integrated suite of data transformation authoring, change management, data quality, pipeline scheduling, and metadata introspection capabilities that work cohesively to provide a "mission control" for data engineers.

Learn more about transforming data with Foundry.

Pipeline management¶

Foundry’s pipeline management capabilities combine change management, data quality, and data loading features.

The Pipeline Builder application enables fast, flexible, and scalable delivery of data pipelines while providing robustness and security. Learn more about Pipeline Builder.

Data engineers can define a rigorous release process for production pipelines, including health checks that guarantee only fully compliant data will be deployed to production. Where issues are found, the platform provides diagnostics on the discrepancies detected.

Diagnostics are available both in Foundry's integrated analysis and modeling tools, as well as to any third-party tools accessing the outputs via REST APIs or other interfaces.

Learn more about maintaining and managing pipelines in Foundry.

中文翻译¶

数据集成概览

数据连接与集成¶

Foundry 提供了一套高度可配置的数据连接与集成工具，其功能远超传统的提取-转换-加载(ETL)或提取-加载-转换(ELT)解决方案。Foundry 旨在通过丰富的功能套件降低数据集成成本，这些功能套件能够为数据团队带来倍增效应。虽然通用云服务为基本管道和实验提供了存储与计算能力，但关键业务运营所需的数据管理、交付和验证还需要更多层次的功能支持。Foundry 的设计目标正是成为全球最复杂环境中的数据集成骨干。

连接数据¶

这一切始于一个可扩展的数据连接框架，该框架能够与所有类型的源系统（结构化、非结构化或半结构化）建立连接，并支持所有关键数据传输方式，如批量、微批量或流式处理。此功能与平台的数据转换和数据管理功能深度集成，包括数据版本的完整血缘追踪、协作管理数据提取的细粒度安全控制，以及数据同步配置的分支管理。

了解更多关于在 Foundry 中连接数据的信息。

数据转换¶

在数据转换方面，Foundry 提供了一个可扩展、可扩展的数据构建系统，该系统利用多模态计算生成输出数据集。Foundry 的计算无关型"构建"框架提供了完全集成的安全性和数据血缘，并支持混合搭配第三方计算运行时。Foundry 还包含一套集成的数据转换编写、变更管理、数据质量、管道调度和元数据自省功能，这些功能协同工作，为数据工程师提供"任务控制中心"。

了解更多关于使用 Foundry 转换数据的信息。

管道管理¶

Foundry 的管道管理功能结合了变更管理、数据质量和数据加载特性。

Pipeline Builder 应用能够快速、灵活且可扩展地交付数据管道，同时提供稳健性和安全性。了解更多关于 Pipeline Builder 的信息。

数据工程师可以为生产管道定义严格的发布流程，包括健康检查，确保只有完全合规的数据才会部署到生产环境。当发现问题时，平台会提供检测到的差异诊断信息。

这些诊断信息既可在 Foundry 集成的分析和建模工具中获取，也可通过 REST API 或其他接口访问输出的第三方工具获取。

了解更多关于在 Foundry 中维护和管理管道的信息。