transforms.api¶

The Transforms Python API provides classes and decorators for constructing a Pipeline.

Functions¶

Name	Description
`configure`([profile, allowed_run_duration, ...])	A decorator that modifies the configuration of a Spark transform.
`incremental`([require_incremental, ...])	A decorator to convert inputs and outputs into their `transforms.api.incremental` counterparts.
`lightweight`([_maybe_function, cpu_cores, ...])
`transform_df`(output, **inputs)	Register the wrapped compute function as a DataFrame transform.
`transform_pandas`(output, **inputs)	Register the wrapped compute function as a pandas transform.
`transform_polars`(output, **inputs)	Register the wrapped compute function as a Polars transform.

Classes¶

Name	Description
`BooleanParam`(default, *[, description])	Specification for the `ParameterSpec` definition used as an input to a transform.
`Check`(expectation, name[, on_error, description])	Wraps up an expectation such that it can be registered with Data Health.
`ComputeBackend`(*values)	Enum class for representing the different compute backends for use in `configure()`.
`ContainerTransform`(transform, *[, ...])	A callable object that describes a single step of a lightweight, single-node computation.
`ContainerTransformsConfiguration`(transform, *)	A callable object that describes a single step of a lightweight, single-node computation.
`Dataset`(alias)	A class representing the files backing a Foundry dataset view.
`FileStatus`(path, size, modified)	A `collections.namedtuple` capturing details about a `FoundryFS` file in Spark transforms.
`FileSystem`(foundry_fs[, read_only])	A filesystem object for reading and writing raw dataset files in Spark transforms.
`FloatParam`(default, *[, description])	Specification for the `ParameterSpec` definition used as an input to a transform.
`FoundryDataSidecarFile`(param, path, ...)	A file object for reading and writing raw dataset files in lightweight, single-node transforms.
`FoundryDataSidecarFileSystem`(param[, ...])	A file system for reading and writing raw dataset files in lightweight, single-node transforms.
`FoundryInputParam`(aliases[, branch, type, ...])	A base class for transforms input parameters.
`FoundryOutputParam`(aliases[, type, ...])	A base class for transforms output parameters.
`IncrementalLightweightInput`(alias, rid[, branch])	The input object passed into incremental `ContainerTransform` objects at runtime.
`IncrementalLightweightOutput`(alias, rid[, ...])	The output object passed into user code at runtime for incremental `ContainerTransform` objects.
`IncrementalTableTransformInput`(table_tinput, ...)	`TableTransformInput` with added functionality for incremental computation.
`IncrementalTransformContext`(is_incremental, ...)	`TransformContext` with added functionality for incremental computation.
`IncrementalTransformInput`(tinput[, ...])	`TransformInput` with added functionality for incremental computation.
`IncrementalTransformOutput`(toutput[, ...])	`TransformOutput` with added functionality for incremental computation.
`Input`([alias, branch, description, ...])	Specification for a transform dataset input.
`InputSet`([aliases, description])	Specification for a list of transform inputs.
`IntegerParam`(default, *[, description])	Specification for a `ParameterSpec` definition used as an input to a transform.
`LightweightContext`()	A context object that can optionally be injected into the compute function of a lightweight transform.
`LightweightInput`(alias, rid[, branch])	The input object passed into `ContainerTransform` objects at runtime.
`LightweightInputParam`()	Base type for input parameters compatible with lightweight, single node transforms.
`LightweightOutput`(alias, rid[, branch])	The output object passed to user code at runtime.
`LightweightOutputParam`()	Base type for output parameters compatible with lightweight, single node transforms.
`Markings`(marking_ids, on_branches)	Specification for a marking that stops propagating from input.
`OrgMarkings`(marking_ids, on_branches)	Specification for a marking that is no longer required on the output.
`Output`([alias, sever_permissions, ...])	Specification for a transform output.
`OutputSet`([aliases, sever_permissions, ...])	Specification for a list of transform outputs.
`Param`([description])	Base class for any parameter taken by the transform compute function.
`ParamContext`(foundry_connector, input_specs, ...)	A context object injected in the `instance` method of a parameter.
`ParamValueInput`(value)	A wrapper around the value of a parameter spec.
`Pipeline`()	An object for grouping a collection of `Transform` objects.
`StringParam`(default, *[, description, ...])	Specification for the `ParameterSpec` definition used as an input to a transform.
`TableTransformInput`(rid, branch, table_dfreader)	The input object passed into transform objects at runtime for virtual table inputs.
`Transform`(compute_func[, inputs, outputs, ...])	A callable object that describes a single step of a Spark computation.
`TransformContext`(foundry_connector[, ...])	A context object that can optionally be injected into the compute function of a transform.
`TransformInput`(rid, branch, txrange, ...[, ...])	The input object passed into `Transform` objects at runtime.
`TransformOutput`(rid, branch, txrid, ...[, mode])	The output object passed into `Transform` objects at runtime.
`transform`(**kwargs)	Wrap a compute function as a `Transform` object.

Exceptions¶

Name	Description
`LightweightException`	Base exception for lightweight compatibility checks.
`LightweightNotImplementedError`(message)	Lightweight-specific `NotImplementedError` ↗ for unsupported features.
`LightweightTypeError`(message)	Exception for type errors in lightweight compatibility checks.
`LightweightValueError`(message)	Exception for value errors in lightweight compatibility checks.

中文翻译¶

transforms.api¶

Transforms Python API 提供了用于构建Pipeline的类和装饰器。

函数¶

名称	描述
`configure`([profile, allowed_run_duration, ...])	一个装饰器，用于修改 Spark 转换(transform)的配置。
`incremental`([require_incremental, ...])	一个装饰器，用于将输入和输出转换为其对应的`transforms.api.incremental`类型。
`lightweight`([_maybe_function, cpu_cores, ...])
`transform_df`(output, **inputs)	将包装的计算函数注册为 DataFrame 转换(transform)。
`transform_pandas`(output, **inputs)	将包装的计算函数注册为 pandas 转换(transform)。
`transform_polars`(output, **inputs)	将包装的计算函数注册为 Polars 转换(transform)。

类¶

名称	描述
`BooleanParam`(default, *[, description])	用于作为转换(transform)输入的`ParameterSpec`定义的规范。
`Check`(expectation, name[, on_error, description])	封装一个期望值，使其能够注册到数据健康(Data Health)中。
`ComputeBackend`(*values)	枚举类，用于表示在`configure()`中使用的不同计算后端。
`ContainerTransform`(transform, *[, ...])	一个可调用对象，描述轻量级单节点计算的单个步骤。
`ContainerTransformsConfiguration`(transform, *)	一个可调用对象，描述轻量级单节点计算的单个步骤。
`Dataset`(alias)	表示 Foundry 数据集视图(dataset view)所对应文件的类。
`FileStatus`(path, size, modified)	一个`collections.namedtuple`，捕获 Spark 转换(transform)中`FoundryFS`文件的详细信息。
`FileSystem`(foundry_fs[, read_only])	用于在 Spark 转换(transform)中读写原始数据集文件的文件系统对象。
`FloatParam`(default, *[, description])	用于作为转换(transform)输入的`ParameterSpec`定义的规范。
`FoundryDataSidecarFile`(param, path, ...)	用于在轻量级单节点转换(transform)中读写原始数据集文件的文件对象。
`FoundryDataSidecarFileSystem`(param[, ...])	用于在轻量级单节点转换(transform)中读写原始数据集文件的文件系统。
`FoundryInputParam`(aliases[, branch, type, ...])	转换(transform)输入参数的基类。
`FoundryOutputParam`(aliases[, type, ...])	转换(transform)输出参数的基类。
`IncrementalLightweightInput`(alias, rid[, branch])	在运行时传入增量式`ContainerTransform`对象的输入对象。
`IncrementalLightweightOutput`(alias, rid[, ...])	在运行时传入用户代码的输出对象，用于增量式`ContainerTransform`对象。
`IncrementalTableTransformInput`(table_tinput, ...)	增加了增量计算功能的`TableTransformInput`。
`IncrementalTransformContext`(is_incremental, ...)	增加了增量计算功能的`TransformContext`。
`IncrementalTransformInput`(tinput[, ...])	增加了增量计算功能的`TransformInput`。
`IncrementalTransformOutput`(toutput[, ...])	增加了增量计算功能的`TransformOutput`。
`Input`([alias, branch, description, ...])	转换(transform)数据集输入的规范。
`InputSet`([aliases, description])	转换(transform)输入列表的规范。
`IntegerParam`(default, *[, description])	用于作为转换(transform)输入的`ParameterSpec`定义的规范。
`LightweightContext`()	一个上下文对象，可选择性地注入到轻量级转换(transform)的计算函数中。
`LightweightInput`(alias, rid[, branch])	在运行时传入`ContainerTransform`对象的输入对象。
`LightweightInputParam`()	与轻量级单节点转换(transform)兼容的输入参数的基类型。
`LightweightOutput`(alias, rid[, branch])	在运行时传入用户代码的输出对象。
`LightweightOutputParam`()	与轻量级单节点转换(transform)兼容的输出参数的基类型。
`Markings`(marking_ids, on_branches)	用于停止从输入传播的标记(marking)的规范。
`OrgMarkings`(marking_ids, on_branches)	用于输出上不再需要的标记(marking)的规范。
`Output`([alias, sever_permissions, ...])	转换(transform)输出的规范。
`OutputSet`([aliases, sever_permissions, ...])	转换(transform)输出列表的规范。
`Param`([description])	转换(transform)计算函数所接受的任何参数的基类。
`ParamContext`(foundry_connector, input_specs, ...)	注入到参数`instance`方法中的上下文对象。
`ParamValueInput`(value)	参数规范(parameter spec)值的包装器。
`Pipeline`()	用于对一组`Transform`对象进行分组的对象。
`StringParam`(default, *[, description, ...])	用于作为转换(transform)输入的`ParameterSpec`定义的规范。
`TableTransformInput`(rid, branch, table_dfreader)	在运行时传入转换(transform)对象的输入对象，用于虚拟表(virtual table)输入。
`Transform`(compute_func[, inputs, outputs, ...])	一个可调用对象，描述 Spark 计算的单个步骤。
`TransformContext`(foundry_connector[, ...])	一个上下文对象，可选择性地注入到转换(transform)的计算函数中。
`TransformInput`(rid, branch, txrange, ...[, ...])	在运行时传入`Transform`对象的输入对象。
`TransformOutput`(rid, branch, txrid, ...[, mode])	在运行时传入`Transform`对象的输出对象。
`transform`(**kwargs)	将计算函数包装为`Transform`对象。

异常¶

名称	描述
`LightweightException`	轻量级兼容性检查的基异常。
`LightweightNotImplementedError`(message)	针对不支持功能的轻量级特定`NotImplementedError` ↗。
`LightweightTypeError`(message)	轻量级兼容性检查中的类型错误异常。
`LightweightValueError`(message)	轻量级兼容性检查中的值错误异常。