Core concepts(核心概念)¶
Workbooks¶
The main resource you interact with in Code Workbook is a Workbook. Workbooks are used to import datasets from Foundry and transform these input datasets for purposes such as:
- Cleaning and joining raw data imported from some external source to produce curated datasets for other users.
- Analyzing processed datasources to derive useful insights.
- Training and applying models to do predictive analysis.
- Creating parameterized visualizations to display in a report.
Transforms¶
Transforms are pieces of logic that take one or more inputs and return a single output. Inputs and outputs can be Foundry datasets or models.
Templates¶
Code templates enable users with a range of technical experience to collaborate by abstracting code away behind a simple form-based interface. Values selected by users are substituted into a code template, which can then be run like any other transform in the Workbook.
Environment¶
Each Code Workbook is associated with an environment. An environment includes a set of Conda packages and Spark settings installed on the Spark module backing computation in the Workbook.
Learn more about environments.
Branching¶
Branching in Code Workbook provides a version control experience tailored to data transformation, enabling teams to operate on logic and data simultaneously in a Workbook.
中文翻译¶
核心概念¶
工作簿(Workbook)¶
在代码工作簿(Code Workbook)中,您主要交互的资源是工作簿(Workbook)。工作簿用于从 Foundry 导入数据集,并对这些输入数据集进行转换,目的包括:
- 清理和合并从外部源导入的原始数据,为其他用户生成精选数据集。
- 分析已处理的数据源以获取有价值的见解。
- 训练和应用模型进行预测分析。
- 创建参数化可视化图表以在报告中展示。
转换(Transform)¶
转换(Transform) 是接受一个或多个输入并返回单一输出的逻辑片段。输入和输出可以是 Foundry 数据集或模型。
模板(Template)¶
代码模板(Template) 通过将代码抽象到简单的表单界面背后,使具有不同技术经验的用户能够协作。用户选择的值会被替换到代码模板中,随后可像工作簿中的其他转换一样运行。
环境(Environment)¶
每个代码工作簿都与一个环境(Environment) 相关联。环境包含一组 Conda 包和 Spark 设置,这些设置安装在工作簿计算所依赖的 Spark 模块上。
分支(Branching)¶
代码工作簿中的分支(Branching) 提供了专为数据转换定制的版本控制体验,使团队能够在工作簿中同时对逻辑和数据进行操作。