Supported languages(支持的语言)¶
Before getting started with your data transformation, it’s important to consider the benefits as well as the limitations of each language. This table includes a summary of the key differences between the supported languages:
| Description | SQL | Python | Java |
|---|---|---|---|
| Non-proprietary language: documentation available online | ✓ | ✓ | ✓ |
| Support for file access: read and write files in Foundry datasets—this means your data transformation can operate on unstructured data | ✓ | ✓ | |
| Transform Level Logic Versioning (TLLV): more info in the TLLV section | ✓ | ✓ | |
| Incremental computation: more info in the incremental computation section | ✓ | ✓ | |
| Support for removing inherited markings | ✓ | ✓ | ✓ |
| Multiple output datasets allowed per file | ✓ | ✓ | |
| Support for dataset previews | ✓ | ✓ | ✓ |
| Custom Transforms profiles | ✓ | ✓ | ✓ |
SQL¶
SQL is a language that has plenty of external documentation available online. Here are some key benefits of writing data transformations in SQL:
- SQL is the most performant language (including most Spark optimization).
- Transforms SQL gives you access to a SQL scratchpad that allows you to run sample SQL queries to check your SQL syntax.
Learn more about SQL Transforms.
Python¶
Python is a language with plenty of external documentation available online. You may want to write data transformations in Python so that you can take advantage of the language-specific capabilities and libraries of Python. The Python API is lower-level than other languages like SQL. Here are some key benefits of using Python:
- The
transformsPython library is an API that exposes functionalities such as file reads and writes. File-based data transformations can be useful early on in data transformation pipelines when you want to parse and clean data. - There is first-class support for using external libraries such as pandas, NumPy, and other machine learning libraries.
- You get access to the full Spark Python (PySpark) API, which includes additional features of Spark that aren’t supported in other languages.
Learn more about Python Transforms.
Java¶
Java is a language with plenty of external documentation available online. You may want to write data transformations in Java so that you can take advantage of the language-specific capabilities in Java. Java is a lower-level API than other languages like SQL. Here are some key benefits of using Java:
- The
transformsJava library is an API that exposes functionalities such as file reads and writes. File-based data transformations can be useful early on in data transformation pipelines when you want to parse and clean data.
Learn more about Java Transforms.
中文翻译¶
支持的语言¶
在开始进行数据转换之前,了解每种语言的优势和局限性非常重要。下表总结了支持的语言之间的主要差异:
| 描述 | SQL | Python | Java |
|---|---|---|---|
| 非专有语言: 可在线获取文档 | ✓ | ✓ | ✓ |
| 支持文件访问: 在 Foundry 数据集中读写文件——这意味着您的数据转换可以处理非结构化数据 | ✓ | ✓ | |
| 转换级别逻辑版本控制(TLLV): 更多信息请参阅 TLLV 部分 | ✓ | ✓ | |
| 增量计算: 更多信息请参阅增量计算部分 | ✓ | ✓ | |
| 支持移除继承的标记 | ✓ | ✓ | ✓ |
| 每个文件允许多个输出数据集 | ✓ | ✓ | |
| 支持数据集预览 | ✓ | ✓ | ✓ |
| 自定义转换配置文件 | ✓ | ✓ | ✓ |
SQL¶
SQL 是一种拥有大量在线外部文档的语言。以下是使用 SQL 编写数据转换的一些关键优势:
- SQL 是性能最高的语言(包括大多数 Spark 优化)。
- Transforms SQL 为您提供了一个 SQL 草稿板,允许您运行示例 SQL 查询来检查 SQL 语法。
Python¶
Python 是一种拥有大量在线外部文档的语言。您可能希望使用 Python 编写数据转换,以便利用 Python 语言特定的功能和库。Python API 比 SQL 等其他语言的层级更低。以下是使用 Python 的一些关键优势:
transformsPython 库是一个公开文件读写等功能的 API。在数据转换管道的早期阶段,当您需要解析和清理数据时,基于文件的数据转换非常有用。- 对使用外部库(如 pandas、NumPy 和其他机器学习库)提供了一流支持。
- 您可以访问完整的 Spark Python(PySpark)API,其中包括其他语言不支持的 Spark 附加功能。
Java¶
Java 是一种拥有大量在线外部文档的语言。您可能希望使用 Java 编写数据转换,以便利用 Java 语言特定的功能。Java 是一种比 SQL 等其他语言层级更低的 API。以下是使用 Java 的一些关键优势:
transformsJava 库是一个公开文件读写等功能的 API。在数据转换管道的早期阶段,当您需要解析和清理数据时,基于文件的数据转换非常有用。