跳转至

transforms.api

The Transforms Python API provides classes and decorators for constructing a Pipeline.

Functions

Name Description
configure([profile, allowed_run_duration, ...]) A decorator that modifies the configuration of a Spark transform.
incremental([require_incremental, ...]) A decorator to convert inputs and outputs into their transforms.api.incremental counterparts.
lightweight([_maybe_function, cpu_cores, ...])
transform_df(output, **inputs) Register the wrapped compute function as a DataFrame transform.
transform_pandas(output, **inputs) Register the wrapped compute function as a pandas transform.
transform_polars(output, **inputs) Register the wrapped compute function as a Polars transform.

Classes

Name Description
BooleanParam(default, *[, description]) Specification for the ParameterSpec definition used as an input to a transform.
Check(expectation, name[, on_error, description]) Wraps up an expectation such that it can be registered with Data Health.
ComputeBackend(*values) Enum class for representing the different compute backends for use in configure().
ContainerTransform(transform, *[, ...]) A callable object that describes a single step of a lightweight, single-node computation.
ContainerTransformsConfiguration(transform, *) A callable object that describes a single step of a lightweight, single-node computation.
Dataset(alias) A class representing the files backing a Foundry dataset view.
FileStatus(path, size, modified) A collections.namedtuple capturing details about a FoundryFS file in Spark transforms.
FileSystem(foundry_fs[, read_only]) A filesystem object for reading and writing raw dataset files in Spark transforms.
FloatParam(default, *[, description]) Specification for the ParameterSpec definition used as an input to a transform.
FoundryDataSidecarFile(param, path, ...) A file object for reading and writing raw dataset files in lightweight, single-node transforms.
FoundryDataSidecarFileSystem(param[, ...]) A file system for reading and writing raw dataset files in lightweight, single-node transforms.
FoundryInputParam(aliases[, branch, type, ...]) A base class for transforms input parameters.
FoundryOutputParam(aliases[, type, ...]) A base class for transforms output parameters.
IncrementalLightweightInput(alias, rid[, branch]) The input object passed into incremental ContainerTransform objects at runtime.
IncrementalLightweightOutput(alias, rid[, ...]) The output object passed into user code at runtime for incremental ContainerTransform objects.
IncrementalTableTransformInput(table_tinput, ...) TableTransformInput with added functionality for incremental computation.
IncrementalTransformContext(is_incremental, ...) TransformContext with added functionality for incremental computation.
IncrementalTransformInput(tinput[, ...]) TransformInput with added functionality for incremental computation.
IncrementalTransformOutput(toutput[, ...]) TransformOutput with added functionality for incremental computation.
Input([alias, branch, description, ...]) Specification for a transform dataset input.
InputSet([aliases, description]) Specification for a list of transform inputs.
IntegerParam(default, *[, description]) Specification for a ParameterSpec definition used as an input to a transform.
LightweightContext() A context object that can optionally be injected into the compute function of a lightweight transform.
LightweightInput(alias, rid[, branch]) The input object passed into ContainerTransform objects at runtime.
LightweightInputParam() Base type for input parameters compatible with lightweight, single node transforms.
LightweightOutput(alias, rid[, branch]) The output object passed to user code at runtime.
LightweightOutputParam() Base type for output parameters compatible with lightweight, single node transforms.
Markings(marking_ids, on_branches) Specification for a marking that stops propagating from input.
OrgMarkings(marking_ids, on_branches) Specification for a marking that is no longer required on the output.
Output([alias, sever_permissions, ...]) Specification for a transform output.
OutputSet([aliases, sever_permissions, ...]) Specification for a list of transform outputs.
Param([description]) Base class for any parameter taken by the transform compute function.
ParamContext(foundry_connector, input_specs, ...) A context object injected in the instance method of a parameter.
ParamValueInput(value) A wrapper around the value of a parameter spec.
Pipeline() An object for grouping a collection of Transform objects.
StringParam(default, *[, description, ...]) Specification for the ParameterSpec definition used as an input to a transform.
TableTransformInput(rid, branch, table_dfreader) The input object passed into transform objects at runtime for virtual table inputs.
Transform(compute_func[, inputs, outputs, ...]) A callable object that describes a single step of a Spark computation.
TransformContext(foundry_connector[, ...]) A context object that can optionally be injected into the compute function of a transform.
TransformInput(rid, branch, txrange, ...[, ...]) The input object passed into Transform objects at runtime.
TransformOutput(rid, branch, txrid, ...[, mode]) The output object passed into Transform objects at runtime.
transform(**kwargs) Wrap a compute function as a Transform object.

Exceptions

Name Description
LightweightException Base exception for lightweight compatibility checks.
LightweightNotImplementedError(message) Lightweight-specific NotImplementedError for unsupported features.
LightweightTypeError(message) Exception for type errors in lightweight compatibility checks.
LightweightValueError(message) Exception for value errors in lightweight compatibility checks.

中文翻译

transforms.api

Transforms Python API 提供了用于构建 Pipeline 的类和装饰器。

函数

名称 描述
configure([profile, allowed_run_duration, ...]) 一个用于修改 Spark 转换配置的装饰器。
incremental([require_incremental, ...]) 一个用于将输入和输出转换为其 transforms.api.incremental 对应项的装饰器。
lightweight([_maybe_function, cpu_cores, ...])
transform_df(output, **inputs) 将包装的计算函数注册为 DataFrame 转换。
transform_pandas(output, **inputs) 将包装的计算函数注册为 pandas 转换。
transform_polars(output, **inputs) 将包装的计算函数注册为 Polars 转换。

名称 描述
BooleanParam(default, *[, description]) 用作转换输入的 ParameterSpec 定义的规范。
Check(expectation, name[, on_error, description]) 封装期望条件,以便将其注册到 Data Health。
ComputeBackend(*values) 用于表示在 configure() 中使用的不同计算后端的枚举类。
ContainerTransform(transform, *[, ...]) 一个可调用对象,描述轻量级单节点计算的单个步骤。
ContainerTransformsConfiguration(transform, *) 一个可调用对象,描述轻量级单节点计算的单个步骤。
Dataset(alias) 表示支持 Foundry 数据集视图的底层文件的类。
FileStatus(path, size, modified) 一个 collections.namedtuple,用于捕获 Spark 转换中 FoundryFS 文件的详细信息。
FileSystem(foundry_fs[, read_only]) 一个文件系统对象,用于在 Spark 转换中读写原始数据集文件。
FloatParam(default, *[, description]) 用作转换输入的 ParameterSpec 定义的规范。
FoundryDataSidecarFile(param, path, ...) 一个文件对象,用于在轻量级单节点转换中读写原始数据集文件。
FoundryDataSidecarFileSystem(param[, ...]) 一个文件系统,用于在轻量级单节点转换中读写原始数据集文件。
FoundryInputParam(aliases[, branch, type, ...]) 转换输入参数的基类。
FoundryOutputParam(aliases[, type, ...]) 转换输出参数的基类。
IncrementalLightweightInput(alias, rid[, branch]) 在运行时传递给增量 ContainerTransform 对象的输入对象。
IncrementalLightweightOutput(alias, rid[, ...]) 在运行时传递给增量 ContainerTransform 对象的用户代码的输出对象。
IncrementalTableTransformInput(table_tinput, ...) 增加了增量计算功能的 TableTransformInput
IncrementalTransformContext(is_incremental, ...) 增加了增量计算功能的 TransformContext
IncrementalTransformInput(tinput[, ...]) 增加了增量计算功能的 TransformInput
IncrementalTransformOutput(toutput[, ...]) 增加了增量计算功能的 TransformOutput
Input([alias, branch, description, ...]) 转换数据集输入的规范。
InputSet([aliases, description]) 转换输入列表的规范。
IntegerParam(default, *[, description]) 用作转换输入的 ParameterSpec 定义的规范。
LightweightContext() 一个上下文对象,可选择性地注入到轻量级转换的计算函数中。
LightweightInput(alias, rid[, branch]) 在运行时传递给 ContainerTransform 对象的输入对象。
LightweightInputParam() 与轻量级单节点转换兼容的输入参数的基类型。
LightweightOutput(alias, rid[, branch]) 在运行时传递给用户代码的输出对象。
LightweightOutputParam() 与轻量级单节点转换兼容的输出参数的基类型。
Markings(marking_ids, on_branches) 停止从输入传播的标记的规范。
OrgMarkings(marking_ids, on_branches) 输出上不再需要的标记的规范。
Output([alias, sever_permissions, ...]) 转换输出的规范。
OutputSet([aliases, sever_permissions, ...]) 转换输出列表的规范。
Param([description]) 转换计算函数接受的任何参数的基类。
ParamContext(foundry_connector, input_specs, ...) 注入到参数的 instance 方法中的上下文对象。
ParamValueInput(value) 参数规范值的包装器。
Pipeline() 用于对 Transform 对象集合进行分组的对象。
StringParam(default, *[, description, ...]) 用作转换输入的 ParameterSpec 定义的规范。
TableTransformInput(rid, branch, table_dfreader) 在运行时为虚拟表输入传递给转换对象的输入对象。
Transform(compute_func[, inputs, outputs, ...]) 一个可调用对象,描述 Spark 计算的单个步骤。
TransformContext(foundry_connector[, ...]) 一个上下文对象,可选择性地注入到转换的计算函数中。
TransformInput(rid, branch, txrange, ...[, ...]) 在运行时传递给 Transform 对象的输入对象。
TransformOutput(rid, branch, txrid, ...[, mode]) 在运行时传递给 Transform 对象的输出对象。
transform(**kwargs) 将计算函数包装为 Transform 对象。

异常

名称 描述
LightweightException 轻量级兼容性检查的基础异常。
LightweightNotImplementedError(message) 针对不支持功能的轻量级专属 NotImplementedError
LightweightTypeError(message) 轻量级兼容性检查中类型错误的异常。
LightweightValueError(message) 轻量级兼容性检查中值错误的异常。