跳转至

Add datasets(添加数据集)

To begin building a pipeline, add data to your graph using one of the following four methods:

Add data to Pipeline Builder from Data Connection

To access data from a data source, navigate to the Data Connection app in the Foundry navigation sidebar. Find the data source you want to integrate, then click Start Pipelining. Choose a location for your new pipeline, then click Save. This will create a new pipeline, and all syncs connected to your data source will be imported to your Pipeline Builder graph.

:::callout{theme="neutral"} You cannot save a new pipeline to your personal file folder. Set up the recommended Project structure so that data security and governance are organized from the beginning of your development process. :::

Screenshot of sample data connection

Add data from Foundry to Pipeline Builder

To import datasets or media sets that already exist in your Foundry filesystem, proceed to the Pipeline Builder application and select Add Foundry data in the center of your graph space. Search for and select an available dataset, then choose Add data.

Screenshot of add data button

You can add multiple datasets or media sets by adding each of them and choosing Add to selection; once all are selected, choose Add data.

Screenshot of add datasets button

Upload data from your computer to Pipeline Builder

You can also upload dataset or media set files from your computer. Select Upload from your computer to select the file you want to add, or drag and drop the file onto your graph.

Screenshot of manually upload data section

Manually enter data in Pipeline Builder

Input datasets can also be created by defining a data table and manually populating it with data.

Enter data manually icon

Define the new table's schema by selecting column names and types, then manually add values to the table. Manually entered tables can be modified at any point.

Manually enter data in table

The following table lists some of the most common column types in a manual entry table:

Column type Format
String All characters
Timestamp mm/dd/yyy hh:mm:ss; additional timestamp
formats can be used
Date mm/dd/yyyy
Boolean 0 → false, not 0 → true
Binary All characters, will be shown as base64
Integer, long Positive and negative numbers, no decimal point
Double Positive and negative numbers, including decimal point

:::callout{theme="neutral"} You may experience longer loading times when viewing proposed changes to large manually entered tables. As an alternative, you can view the raw tables side by side. :::

Next steps

After adding datasets or media sets to Pipeline Builder, you can change their computation mode, choose to transform the data or add outputs.

Screenshot of imported datasets


中文翻译


添加数据集

要开始构建流水线(pipeline),请使用以下四种方法之一将数据添加到您的画布中: * 数据连接(Data Connection)应用 * 从Foundry中选择数据集(dataset)或媒体集(media set) * 手动上传数据 * 在流水线文件中手动输入数据

从数据连接(Data Connection)向流水线构建器(Pipeline Builder)添加数据

要从数据源获取数据,请在Foundry侧边导航栏中找到数据连接(Data Connection)应用。找到您要集成的数据源,点击Start Pipelining(开始构建流水线),为新流水线选择存储位置后点击Save(保存)。操作完成后将创建一条新流水线,所有与该数据源关联的同步任务(sync)都将导入到您的流水线构建器(Pipeline Builder)画布中。

:::callout{theme="neutral"} 您无法将新流水线保存到个人文件夹。请先搭建推荐项目结构,以便从开发初期就能有序开展数据安全与治理工作。 :::

数据连接示例截图

从Foundry向流水线构建器(Pipeline Builder)添加数据

要导入已存在于Foundry文件系统中的数据集(dataset)或媒体集(media set),请打开流水线构建器(Pipeline Builder)应用,点击画布中央的Add Foundry data(添加Foundry数据)。搜索并选择可用的数据集,然后点击Add data(添加数据)即可。

添加数据按钮截图

您可以添加多个数据集或媒体集:逐个选中目标资源并点击Add to selection(添加到选择项),全部选完后点击Add data(添加数据)即可。

添加数据集按钮截图

从本地计算机向流水线构建器(Pipeline Builder)上传数据

您也可以从本地计算机上传数据集或媒体集文件。点击Upload from your computer(从本地上传)选择要添加的文件,或者直接将文件拖拽到画布中即可。

手动上传数据区域截图

在流水线构建器(Pipeline Builder)中手动输入数据

您也可以通过定义数据表并手动填充数据的方式创建输入数据集。

手动输入数据图标

先定义新表的模式(schema),设置列名与列类型,再手动向表中添加值。手动录入的表格随时都可以修改。

手动输入表格数据截图

下表列出了手动录入表格中最常用的几种列类型:

列类型 格式
字符串(String) 所有字符
时间戳(Timestamp) mm/dd/yyy hh:mm:ss;也支持其他时间戳格式
日期(Date) mm/dd/yyyy
布尔值(Boolean) 0 → 假(false),非0 → 真(true)
二进制(Binary) 所有字符,将以base64格式展示
整数(Integer)、长整数(long) 正负整数,不含小数点
双精度浮点数(Double) 正负数值,可包含小数点

:::callout{theme="neutral"} 查看大型手动录入表格的变更建议时,加载时间可能较长。您可以选择并排查看原始表格作为替代方案。 :::

后续步骤

将数据集或媒体集添加到流水线构建器(Pipeline Builder)后,您可以修改其计算模式(computation mode)、选择转换数据或者添加输出

已导入数据集截图