Connecting to data(连接数据)¶
The first step to getting value from Foundry is to connect it to your Organization's sources of data. Foundry's tools for connecting to data support the full range of standard enterprise data sources, ranging from cloud-based object stores, file systems, and databases and data warehouses.
You can connect to data in a variety of ways with different Foundry applications, depending on the type of data you need to access.
Data Connection¶
Connect to sources to run batch, streaming, media, and CDC syncs and to use virtual tables.
The Data Connection framework is designed to manage data over time, through discrete versions that are managed using dataset transactions. This framework enables full lineage of data versions across time, providing you with an understanding of which sync tasks produced which versions of a given dataset. It also enables syncing of only the data required, in cases where full data loading on each sync is not possible.
Granular security in Data Connection allows federated management of data syncs across different teams. Collections of syncs, or even individual data syncs, can be made visible or editable to only specific teams (defined through role- or classification-based access controls). Learn more about securing a data foundation.
You can manage sync metadata independently of the actual sync definitions. This allows for full branching of new configurations, where the new sync is sandboxed and tested in a branch before it affects any downstream transformation jobs.
HyperAuto¶
To evolve beyond simple data syncing solutions, Palantir HyperAuto implements support for Software-Defined Data Integration (SDDI). This toolset allows organizations to not only connect to common ERP and CRM systems, but also to programmatically generate data pipelines that clean, normalize, and harmonize datasets into a cohesive data asset at unprecedented speed. This data asset can then feed into the Ontology to translate data into operational value.
External transforms¶
Perform scheduled syncs and exports to external systems using REST APIs.
If you want to connect to external sources to create syncs and export data, we recommend using Code Repositories to write external Python transforms using the REST API. You can also add dataset inputs and media set outputs to your transforms.
中文翻译¶
连接数据¶
使用 Foundry 获取价值的第一步是将其连接到您组织的数据源。Foundry 的数据连接工具支持各种标准企业数据源,包括基于云的对象存储、文件系统、数据库以及数据仓库。
根据您需要访问的数据类型,您可以通过不同的 Foundry 应用程序以多种方式连接数据。
数据连接(Data Connection)¶
连接数据源以运行批量、流式、媒体和 CDC 同步,并使用虚拟表。
Data Connection 框架旨在通过使用数据集事务(transactions)管理的离散版本来长期管理数据。该框架支持跨时间的数据版本完整血缘,让您了解哪些同步任务生成了给定数据集的哪些版本。在无法每次同步都加载全部数据的情况下,它还支持仅同步所需的数据。
Data Connection 中的细粒度安全控制允许跨不同团队进行数据同步的联合管理。同步集合甚至单个数据同步可以设置为仅对特定团队可见或可编辑(通过基于角色或分类的访问控制来定义)。了解更多关于保护数据基础的信息。
您可以独立于实际同步定义来管理同步元数据。这允许对新配置进行完整的分支操作,新同步在分支中进行沙盒测试,在影响任何下游转换作业之前得到验证。
HyperAuto¶
为了超越简单的数据同步解决方案,Palantir HyperAuto 实现了对软件定义数据集成(SDDI)的支持。该工具集不仅允许组织连接到常见的 ERP 和 CRM 系统,还能以编程方式生成数据管道,以前所未有的速度将数据集清洗、标准化并协调为统一的数据资产。该数据资产随后可输入到本体论(Ontology)中,将数据转化为运营价值。
外部转换(External transforms)¶
使用 REST API 对外部系统执行计划同步和导出。
如果您希望连接外部源以创建同步和导出数据,我们建议使用代码仓库(Code Repositories)通过 REST API 编写外部 Python 转换(external Python transforms)。您还可以在转换中添加数据集输入和媒体集输出。