Getting started(入门(Getting started))¶
:::callout{theme="success" title="Tip"} The instructions below step through a simple Java data transformation. If you are just getting started with data transformation, consider going through the batch pipeline tutorial for Pipeline Builder or Code Repositories first. :::
Follow the steps here to get started writing your first Java transformation:
-
Create a new Transforms Java repository. Navigate to a Project, select + New > Repository, and select Java under Language template.
-
Download this sample dataset:
Download titanic.zip. Import this dataset into Foundry. -
Navigate to your repository. Your data transformation code goes in
myproject/datasets/HighLevelAutoTransform.java. The sample code in this file is commented out, so make sure to un-comment it before moving on. -
Update the input dataset by replacing
/path/to/input/datasetwith the full path to yourtitanicdataset. -
Update the output dataset by replacing
/path/to/output/datasetwith the full path to your desired output dataset location. -
Let’s modify the default transformation code to filter the
titanicdataset based on gender to get all female passengers. Update your data transformation code inmy_compute_function:
@Compute
// Replace this with the full path to your output dataset.
@Output("/path/to/output/dataset")
// Replace this with the full path to your "titanic" dataset.
public Dataset<Row> myComputeFunction(@Input("/path/to/input/dataset") Dataset<Row> myInput) {
return myInput.filter(myInput.col("Sex").equalTo("female"));
}
- After you successfully commit your changes to your branch, you can open and build your output dataset!
This example defines a high-level Transform that uses automatic registration. For more information about the different types of data transformations supported in Transforms Java as well as an explanation of the template project structure and included files, refer to this documentation.
中文翻译¶
入门(Getting started)¶
:::callout{theme="success" title="提示(Tip)"} 以下说明将逐步引导您完成一个简单的Java数据转换(data transformation)流程。如果您刚接触数据转换,建议先学习适用于Pipeline Builder或Code Repositories的批处理管道(batch pipeline)教程。 :::
请按照以下步骤编写您的第一个Java转换程序:
-
创建一个全新的Transforms Java代码仓库(repository)。进入目标项目(Project),选择 + 新建 > 代码仓库(Repository),再选择语言模板(Language template)分类下的Java。
-
下载示例数据集(dataset):
Download titanic.zip,然后将该数据集导入Foundry。 -
进入您的代码仓库,数据转换代码存放在
myproject/datasets/HighLevelAutoTransform.java路径下。该文件中的示例代码默认已被注释,继续后续操作前请先取消注释。 -
更新输入数据集配置:将
/path/to/input/dataset替换为您的titanic数据集的完整路径。 -
更新输出数据集配置:将
/path/to/output/dataset替换为您期望存放输出数据集的完整路径。 -
我们来修改默认的转换代码,基于性别筛选
titanic数据集,获取所有女性乘客的记录。请更新my_compute_function中的数据转换代码:
@Compute
// Replace this with the full path to your output dataset.
@Output("/path/to/output/dataset")
// Replace this with the full path to your "titanic" dataset.
public Dataset<Row> myComputeFunction(@Input("/path/to/input/dataset") Dataset<Row> myInput) {
return myInput.filter(myInput.col("Sex").equalTo("female"));
}
- 您的修改成功提交(commit)到分支(branch)后,就可以打开并构建输出数据集了!