transforms.api¶

The Transforms Python API provides classes and decorators for constructing a Pipeline.

Functions¶

Name	Description
`configure`([profile, allowed_run_duration, ...])	A decorator that modifies the configuration of a Spark transform.
`incremental`([require_incremental, ...])	A decorator to convert inputs and outputs into their `transforms.api.incremental` counterparts.
`lightweight`([_maybe_function, cpu_cores, ...])
`transform_df`(output, **inputs)	Register the wrapped compute function as a DataFrame transform.
`transform_pandas`(output, **inputs)	Register the wrapped compute function as a pandas transform.
`transform_polars`(output, **inputs)	Register the wrapped compute function as a Polars transform.

Classes¶

Name	Description
`BooleanParam`(default, *[, description])	Specification for the `ParameterSpec` definition used as an input to a transform.
`Check`(expectation, name[, on_error, description])	Wraps up an expectation such that it can be registered with Data Health.
`ComputeBackend`(*values)	Enum class for representing the different compute backends for use in `configure()`.
`ContainerTransform`(transform, *[, ...])	A callable object that describes a single step of a lightweight, single-node computation.
`ContainerTransformsConfiguration`(transform, *)	A callable object that describes a single step of a lightweight, single-node computation.
`Dataset`(alias)	A class representing the files backing a Foundry dataset view.
`FileStatus`(path, size, modified)	A `collections.namedtuple` capturing details about a `FoundryFS` file in Spark transforms.
`FileSystem`(foundry_fs[, read_only])	A filesystem object for reading and writing raw dataset files in Spark transforms.
`FloatParam`(default, *[, description])	Specification for the `ParameterSpec` definition used as an input to a transform.
`FoundryDataSidecarFile`(param, path, ...)	A file object for reading and writing raw dataset files in lightweight, single-node transforms.
`FoundryDataSidecarFileSystem`(param[, ...])	A file system for reading and writing raw dataset files in lightweight, single-node transforms.
`FoundryInputParam`(aliases[, branch, type, ...])	A base class for transforms input parameters.
`FoundryOutputParam`(aliases[, type, ...])	A base class for transforms output parameters.
`IncrementalLightweightInput`(alias, rid[, branch])	The input object passed into incremental `ContainerTransform` objects at runtime.
`IncrementalLightweightOutput`(alias, rid[, ...])	The output object passed into user code at runtime for incremental `ContainerTransform` objects.
`IncrementalTableTransformInput`(table_tinput, ...)	`TableTransformInput` with added functionality for incremental computation.
`IncrementalTransformContext`(is_incremental, ...)	`TransformContext` with added functionality for incremental computation.
`IncrementalTransformInput`(tinput[, ...])	`TransformInput` with added functionality for incremental computation.
`IncrementalTransformOutput`(toutput[, ...])	`TransformOutput` with added functionality for incremental computation.
`Input`([alias, branch, description, ...])	Specification for a transform dataset input.
`InputSet`([aliases, description])	Specification for a list of transform inputs.
`IntegerParam`(default, *[, description])	Specification for a `ParameterSpec` definition used as an input to a transform.
`LightweightContext`()	A context object that can optionally be injected into the compute function of a lightweight transform.
`LightweightInput`(alias, rid[, branch])	The input object passed into `ContainerTransform` objects at runtime.
`LightweightInputParam`()	Base type for input parameters compatible with lightweight, single node transforms.
`LightweightOutput`(alias, rid[, branch])	The output object passed to user code at runtime.
`LightweightOutputParam`()	Base type for output parameters compatible with lightweight, single node transforms.
`Markings`(marking_ids, on_branches)	Specification for a marking that stops propagating from input.
`OrgMarkings`(marking_ids, on_branches)	Specification for a marking that is no longer required on the output.
`Output`([alias, sever_permissions, ...])	Specification for a transform output.
`OutputSet`([aliases, sever_permissions, ...])	Specification for a list of transform outputs.
`Param`([description])	Base class for any parameter taken by the transform compute function.
`ParamContext`(foundry_connector, input_specs, ...)	A context object injected in the `instance` method of a parameter.
`ParamValueInput`(value)	A wrapper around the value of a parameter spec.
`Pipeline`()	An object for grouping a collection of `Transform` objects.
`StringParam`(default, *[, description, ...])	Specification for the `ParameterSpec` definition used as an input to a transform.
`TableTransformInput`(rid, branch, table_dfreader)	The input object passed into transform objects at runtime for virtual table inputs.
`Transform`(compute_func[, inputs, outputs, ...])	A callable object that describes a single step of a Spark computation.
`TransformContext`(foundry_connector[, ...])	A context object that can optionally be injected into the compute function of a transform.
`TransformInput`(rid, branch, txrange, ...[, ...])	The input object passed into `Transform` objects at runtime.
`TransformOutput`(rid, branch, txrid, ...[, mode])	The output object passed into `Transform` objects at runtime.
`transform`(**kwargs)	Wrap a compute function as a `Transform` object.

Exceptions¶

Name	Description
`LightweightException`	Base exception for lightweight compatibility checks.
`LightweightNotImplementedError`(message)	Lightweight-specific `NotImplementedError` ↗ for unsupported features.
`LightweightTypeError`(message)	Exception for type errors in lightweight compatibility checks.
`LightweightValueError`(message)	Exception for value errors in lightweight compatibility checks.

中文翻译¶

transforms.api¶

Transforms Python API 提供了用于构建 Pipeline 的类和装饰器。

函数¶

名称	描述
`configure`([profile, allowed_run_duration, ...])	一个用于修改 Spark 转换配置的装饰器。
`incremental`([require_incremental, ...])	一个用于将输入和输出转换为其 `transforms.api.incremental` 对应项的装饰器。
`lightweight`([_maybe_function, cpu_cores, ...])
`transform_df`(output, **inputs)	将包装的计算函数注册为 DataFrame 转换。
`transform_pandas`(output, **inputs)	将包装的计算函数注册为 pandas 转换。
`transform_polars`(output, **inputs)	将包装的计算函数注册为 Polars 转换。

类¶

名称	描述
`BooleanParam`(default, *[, description])	用作转换输入的 `ParameterSpec` 定义的规范。
`Check`(expectation, name[, on_error, description])	封装期望条件，以便将其注册到 Data Health。
`ComputeBackend`(*values)	用于表示在 `configure()` 中使用的不同计算后端的枚举类。
`ContainerTransform`(transform, *[, ...])	一个可调用对象，描述轻量级单节点计算的单个步骤。
`ContainerTransformsConfiguration`(transform, *)	一个可调用对象，描述轻量级单节点计算的单个步骤。
`Dataset`(alias)	表示支持 Foundry 数据集视图的底层文件的类。
`FileStatus`(path, size, modified)	一个 `collections.namedtuple`，用于捕获 Spark 转换中 `FoundryFS` 文件的详细信息。
`FileSystem`(foundry_fs[, read_only])	一个文件系统对象，用于在 Spark 转换中读写原始数据集文件。
`FloatParam`(default, *[, description])	用作转换输入的 `ParameterSpec` 定义的规范。
`FoundryDataSidecarFile`(param, path, ...)	一个文件对象，用于在轻量级单节点转换中读写原始数据集文件。
`FoundryDataSidecarFileSystem`(param[, ...])	一个文件系统，用于在轻量级单节点转换中读写原始数据集文件。
`FoundryInputParam`(aliases[, branch, type, ...])	转换输入参数的基类。
`FoundryOutputParam`(aliases[, type, ...])	转换输出参数的基类。
`IncrementalLightweightInput`(alias, rid[, branch])	在运行时传递给增量 `ContainerTransform` 对象的输入对象。
`IncrementalLightweightOutput`(alias, rid[, ...])	在运行时传递给增量 `ContainerTransform` 对象的用户代码的输出对象。
`IncrementalTableTransformInput`(table_tinput, ...)	增加了增量计算功能的 `TableTransformInput`。
`IncrementalTransformContext`(is_incremental, ...)	增加了增量计算功能的 `TransformContext`。
`IncrementalTransformInput`(tinput[, ...])	增加了增量计算功能的 `TransformInput`。
`IncrementalTransformOutput`(toutput[, ...])	增加了增量计算功能的 `TransformOutput`。
`Input`([alias, branch, description, ...])	转换数据集输入的规范。
`InputSet`([aliases, description])	转换输入列表的规范。
`IntegerParam`(default, *[, description])	用作转换输入的 `ParameterSpec` 定义的规范。
`LightweightContext`()	一个上下文对象，可选择性地注入到轻量级转换的计算函数中。
`LightweightInput`(alias, rid[, branch])	在运行时传递给 `ContainerTransform` 对象的输入对象。
`LightweightInputParam`()	与轻量级单节点转换兼容的输入参数的基类型。
`LightweightOutput`(alias, rid[, branch])	在运行时传递给用户代码的输出对象。
`LightweightOutputParam`()	与轻量级单节点转换兼容的输出参数的基类型。
`Markings`(marking_ids, on_branches)	停止从输入传播的标记的规范。
`OrgMarkings`(marking_ids, on_branches)	输出上不再需要的标记的规范。
`Output`([alias, sever_permissions, ...])	转换输出的规范。
`OutputSet`([aliases, sever_permissions, ...])	转换输出列表的规范。
`Param`([description])	转换计算函数接受的任何参数的基类。
`ParamContext`(foundry_connector, input_specs, ...)	注入到参数的 `instance` 方法中的上下文对象。
`ParamValueInput`(value)	参数规范值的包装器。
`Pipeline`()	用于对 `Transform` 对象集合进行分组的对象。
`StringParam`(default, *[, description, ...])	用作转换输入的 `ParameterSpec` 定义的规范。
`TableTransformInput`(rid, branch, table_dfreader)	在运行时为虚拟表输入传递给转换对象的输入对象。
`Transform`(compute_func[, inputs, outputs, ...])	一个可调用对象，描述 Spark 计算的单个步骤。
`TransformContext`(foundry_connector[, ...])	一个上下文对象，可选择性地注入到转换的计算函数中。
`TransformInput`(rid, branch, txrange, ...[, ...])	在运行时传递给 `Transform` 对象的输入对象。
`TransformOutput`(rid, branch, txrid, ...[, mode])	在运行时传递给 `Transform` 对象的输出对象。
`transform`(**kwargs)	将计算函数包装为 `Transform` 对象。

异常¶

名称	描述
`LightweightException`	轻量级兼容性检查的基础异常。
`LightweightNotImplementedError`(message)	针对不支持功能的轻量级专属 `NotImplementedError` ↗。
`LightweightTypeError`(message)	轻量级兼容性检查中类型错误的异常。
`LightweightValueError`(message)	轻量级兼容性检查中值错误的异常。