Supported languages（支持的语言）¶

Before getting started with your data transformation, it’s important to consider the benefits as well as the limitations of each language. This table includes a summary of the key differences between the supported languages:

Description	SQL	Python	Java
Non-proprietary language: documentation available online	✓	✓	✓
Support for file access: read and write files in Foundry datasets—this means your data transformation can operate on unstructured data		✓	✓
Transform Level Logic Versioning (TLLV): more info in the TLLV section	✓	✓
Incremental computation: more info in the incremental computation section		✓	✓
Support for removing inherited markings	✓	✓	✓
Multiple output datasets allowed per file		✓	✓
Support for dataset previews	✓	✓	✓
Custom Transforms profiles	✓	✓	✓

SQL¶

SQL is a language that has plenty of external documentation available online. Here are some key benefits of writing data transformations in SQL:

SQL is the most performant language (including most Spark optimization).
Transforms SQL gives you access to a SQL scratchpad that allows you to run sample SQL queries to check your SQL syntax.

Learn more about SQL Transforms.

Python¶

Python is a language with plenty of external documentation available online. You may want to write data transformations in Python so that you can take advantage of the language-specific capabilities and libraries of Python. The Python API is lower-level than other languages like SQL. Here are some key benefits of using Python:

The transforms Python library is an API that exposes functionalities such as file reads and writes. File-based data transformations can be useful early on in data transformation pipelines when you want to parse and clean data.
There is first-class support for using external libraries such as pandas, NumPy, and other machine learning libraries.
You get access to the full Spark Python (PySpark) API, which includes additional features of Spark that aren’t supported in other languages.

Learn more about Python Transforms.

Java¶

Java is a language with plenty of external documentation available online. You may want to write data transformations in Java so that you can take advantage of the language-specific capabilities in Java. Java is a lower-level API than other languages like SQL. Here are some key benefits of using Java:

The transforms Java library is an API that exposes functionalities such as file reads and writes. File-based data transformations can be useful early on in data transformation pipelines when you want to parse and clean data.

Learn more about Java Transforms.

中文翻译¶

支持的语言¶

在开始进行数据转换之前，了解每种语言的优势和局限性非常重要。下表总结了支持的语言之间的主要差异：

描述	SQL	Python	Java
非专有语言：可在线获取文档	✓	✓	✓
支持文件访问：在 Foundry 数据集中读写文件——这意味着您的数据转换可以处理非结构化数据		✓	✓
转换级别逻辑版本控制（TLLV）：更多信息请参阅 TLLV 部分	✓	✓
增量计算：更多信息请参阅增量计算部分		✓	✓
支持移除继承的标记	✓	✓	✓
每个文件允许多个输出数据集		✓	✓
支持数据集预览	✓	✓	✓
自定义转换配置文件	✓	✓	✓

SQL¶

SQL 是一种拥有大量在线外部文档的语言。以下是使用 SQL 编写数据转换的一些关键优势：

SQL 是性能最高的语言（包括大多数 Spark 优化）。
Transforms SQL 为您提供了一个 SQL 草稿板，允许您运行示例 SQL 查询来检查 SQL 语法。

了解更多关于 SQL Transforms 的信息。

Python¶

Python 是一种拥有大量在线外部文档的语言。您可能希望使用 Python 编写数据转换，以便利用 Python 语言特定的功能和库。Python API 比 SQL 等其他语言的层级更低。以下是使用 Python 的一些关键优势：

transforms Python 库是一个公开文件读写等功能的 API。在数据转换管道的早期阶段，当您需要解析和清理数据时，基于文件的数据转换非常有用。
对使用外部库（如 pandas、NumPy 和其他机器学习库）提供了一流支持。
您可以访问完整的 Spark Python（PySpark）API，其中包括其他语言不支持的 Spark 附加功能。

了解更多关于 Python Transforms 的信息。

Java¶

Java 是一种拥有大量在线外部文档的语言。您可能希望使用 Java 编写数据转换，以便利用 Java 语言特定的功能。Java 是一种比 SQL 等其他语言层级更低的 API。以下是使用 Java 的一些关键优势：

transforms Java 库是一个公开文件读写等功能的 API。在数据转换管道的早期阶段，当您需要解析和清理数据时，基于文件的数据转换非常有用。

了解更多关于 Java Transforms 的信息。