Data Preparation FAQ¶
The following are some frequently asked questions about Preparation.
For general information, view our data Preparation documentation.
- Data Preparation questions
- What is Preparation?
- Who should use Preparation?
- What can I do with Preparation?
- Can I change input dataset in Preparation?
What is Preparation?¶
Preparation is an application for cleaning and preparing data powered by the Contour backend.
Who should use Preparation?¶
We aim for it to be walk-up usable or require minimal training by everyone on the enrollment. Upon initial load, a user will instantly understand the shape (row and column information) and cleanliness of their data. For example, quality flags such as extra white space or high null percentage will guide the user step by step to either fix or ignore these flags.
That said, people who only consume Notepad documents, for example, likely will not need to use Preparation. Some code repository processes, however, may be simplified through Preparation.
What can I do with Preparation?¶
Here are a few examples of how Preparation could be used to easily clean or prepare real data:
- Normalize zip codes to five digits.
- Identify and nullify 0 values for latitude/longitude.
- Create hyperlinks by appending an ID column to a URL.
- Normalize values by removing leading and trailing whitespace.
- Split a currency column (for example: “USD 1000”) into separate currency code and amount columns.
Can I change input dataset in Preparation?¶
Yes. On the right side Change Log panel, scroll down to the very bottom and edit the starting dataset. If you want to apply the same logic to a different dataset but keep the original one, you can duplicate your preparation beforehand by selecting the small dropdown menu next to its name.
中文翻译¶
数据准备常见问题¶
以下是一些关于数据准备(Preparation)的常见问题。
如需了解常规信息,请查阅我们的数据准备文档。
什么是数据准备?¶
数据准备(Preparation)是一款基于 Contour 后端引擎的数据清洗与准备应用程序。
谁应该使用数据准备?¶
我们的目标是让所有注册用户无需培训或仅需极少培训即可上手使用。在初始加载时,用户能立即了解数据的结构(行和列信息)及整洁程度。例如,多余空格或高空值比例等质量标记会逐步引导用户选择修复或忽略这些标记。
不过,例如仅使用记事本文档的用户可能无需使用数据准备。但某些代码仓库流程可通过数据准备简化。
我能用数据准备做什么?¶
以下是一些使用数据准备轻松清洗或准备真实数据的示例:
- 将邮政编码规范化为五位数字。
- 识别并清空经纬度中的 0 值。
- 通过将 ID 列附加到 URL 来创建超链接。
- 通过删除前导和尾随空格来规范化值。
- 将货币列(例如:"USD 1000")拆分为单独的货币代码和金额列。
我能在数据准备中更改输入数据集吗?¶
可以。在右侧的更改日志(Change Log)面板中,滚动到底部并编辑起始数据集。如果您想将相同逻辑应用于不同数据集但保留原始数据集,可以事先通过点击其名称旁的小下拉菜单来复制您的数据准备。