跳转至

transforms.api

The Transforms Python API provides classes and decorators for constructing a Pipeline.

Functions

Name Description
configure([profile, allowed_run_duration, ...]) A decorator that modifies the configuration of a Spark transform.
incremental([require_incremental, ...]) A decorator to convert inputs and outputs into their transforms.api.incremental counterparts.
lightweight([_maybe_function, cpu_cores, ...])
transform_df(output, **inputs) Register the wrapped compute function as a DataFrame transform.
transform_pandas(output, **inputs) Register the wrapped compute function as a pandas transform.
transform_polars(output, **inputs) Register the wrapped compute function as a Polars transform.

Classes

Name Description
BooleanParam(default, *[, description]) Specification for the ParameterSpec definition used as an input to a transform.
Check(expectation, name[, on_error, description]) Wraps up an expectation such that it can be registered with Data Health.
ComputeBackend(*values) Enum class for representing the different compute backends for use in configure().
ContainerTransform(transform, *[, ...]) A callable object that describes a single step of a lightweight, single-node computation.
ContainerTransformsConfiguration(transform, *) A callable object that describes a single step of a lightweight, single-node computation.
Dataset(alias) A class representing the files backing a Foundry dataset view.
FileStatus(path, size, modified) A collections.namedtuple capturing details about a FoundryFS file in Spark transforms.
FileSystem(foundry_fs[, read_only]) A filesystem object for reading and writing raw dataset files in Spark transforms.
FloatParam(default, *[, description]) Specification for the ParameterSpec definition used as an input to a transform.
FoundryDataSidecarFile(param, path, ...) A file object for reading and writing raw dataset files in lightweight, single-node transforms.
FoundryDataSidecarFileSystem(param[, ...]) A file system for reading and writing raw dataset files in lightweight, single-node transforms.
FoundryInputParam(aliases[, branch, type, ...]) A base class for transforms input parameters.
FoundryOutputParam(aliases[, type, ...]) A base class for transforms output parameters.
IncrementalLightweightInput(alias, rid[, branch]) The input object passed into incremental ContainerTransform objects at runtime.
IncrementalLightweightOutput(alias, rid[, ...]) The output object passed into user code at runtime for incremental ContainerTransform objects.
IncrementalTableTransformInput(table_tinput, ...) TableTransformInput with added functionality for incremental computation.
IncrementalTransformContext(is_incremental, ...) TransformContext with added functionality for incremental computation.
IncrementalTransformInput(tinput[, ...]) TransformInput with added functionality for incremental computation.
IncrementalTransformOutput(toutput[, ...]) TransformOutput with added functionality for incremental computation.
Input([alias, branch, description, ...]) Specification for a transform dataset input.
InputSet([aliases, description]) Specification for a list of transform inputs.
IntegerParam(default, *[, description]) Specification for a ParameterSpec definition used as an input to a transform.
LightweightContext() A context object that can optionally be injected into the compute function of a lightweight transform.
LightweightInput(alias, rid[, branch]) The input object passed into ContainerTransform objects at runtime.
LightweightInputParam() Base type for input parameters compatible with lightweight, single node transforms.
LightweightOutput(alias, rid[, branch]) The output object passed to user code at runtime.
LightweightOutputParam() Base type for output parameters compatible with lightweight, single node transforms.
Markings(marking_ids, on_branches) Specification for a marking that stops propagating from input.
OrgMarkings(marking_ids, on_branches) Specification for a marking that is no longer required on the output.
Output([alias, sever_permissions, ...]) Specification for a transform output.
OutputSet([aliases, sever_permissions, ...]) Specification for a list of transform outputs.
Param([description]) Base class for any parameter taken by the transform compute function.
ParamContext(foundry_connector, input_specs, ...) A context object injected in the instance method of a parameter.
ParamValueInput(value) A wrapper around the value of a parameter spec.
Pipeline() An object for grouping a collection of Transform objects.
StringParam(default, *[, description, ...]) Specification for the ParameterSpec definition used as an input to a transform.
TableTransformInput(rid, branch, table_dfreader) The input object passed into transform objects at runtime for virtual table inputs.
Transform(compute_func[, inputs, outputs, ...]) A callable object that describes a single step of a Spark computation.
TransformContext(foundry_connector[, ...]) A context object that can optionally be injected into the compute function of a transform.
TransformInput(rid, branch, txrange, ...[, ...]) The input object passed into Transform objects at runtime.
TransformOutput(rid, branch, txrid, ...[, mode]) The output object passed into Transform objects at runtime.
transform(**kwargs) Wrap a compute function as a Transform object.

Exceptions

Name Description
LightweightException Base exception for lightweight compatibility checks.
LightweightNotImplementedError(message) Lightweight-specific NotImplementedError for unsupported features.
LightweightTypeError(message) Exception for type errors in lightweight compatibility checks.
LightweightValueError(message) Exception for value errors in lightweight compatibility checks.

中文翻译

transforms.api

Transforms Python API 提供了用于构建Pipeline的类和装饰器。

函数

名称 描述
configure([profile, allowed_run_duration, ...]) 一个装饰器,用于修改 Spark 转换(transform)的配置。
incremental([require_incremental, ...]) 一个装饰器,用于将输入和输出转换为其对应的transforms.api.incremental类型。
lightweight([_maybe_function, cpu_cores, ...])
transform_df(output, **inputs) 将包装的计算函数注册为 DataFrame 转换(transform)。
transform_pandas(output, **inputs) 将包装的计算函数注册为 pandas 转换(transform)。
transform_polars(output, **inputs) 将包装的计算函数注册为 Polars 转换(transform)。

名称 描述
BooleanParam(default, *[, description]) 用于作为转换(transform)输入的ParameterSpec定义的规范。
Check(expectation, name[, on_error, description]) 封装一个期望值,使其能够注册到数据健康(Data Health)中。
ComputeBackend(*values) 枚举类,用于表示在configure()中使用的不同计算后端。
ContainerTransform(transform, *[, ...]) 一个可调用对象,描述轻量级单节点计算的单个步骤。
ContainerTransformsConfiguration(transform, *) 一个可调用对象,描述轻量级单节点计算的单个步骤。
Dataset(alias) 表示 Foundry 数据集视图(dataset view)所对应文件的类。
FileStatus(path, size, modified) 一个collections.namedtuple,捕获 Spark 转换(transform)中FoundryFS文件的详细信息。
FileSystem(foundry_fs[, read_only]) 用于在 Spark 转换(transform)中读写原始数据集文件的文件系统对象。
FloatParam(default, *[, description]) 用于作为转换(transform)输入的ParameterSpec定义的规范。
FoundryDataSidecarFile(param, path, ...) 用于在轻量级单节点转换(transform)中读写原始数据集文件的文件对象。
FoundryDataSidecarFileSystem(param[, ...]) 用于在轻量级单节点转换(transform)中读写原始数据集文件的文件系统。
FoundryInputParam(aliases[, branch, type, ...]) 转换(transform)输入参数的基类。
FoundryOutputParam(aliases[, type, ...]) 转换(transform)输出参数的基类。
IncrementalLightweightInput(alias, rid[, branch]) 在运行时传入增量式ContainerTransform对象的输入对象。
IncrementalLightweightOutput(alias, rid[, ...]) 在运行时传入用户代码的输出对象,用于增量式ContainerTransform对象。
IncrementalTableTransformInput(table_tinput, ...) 增加了增量计算功能的TableTransformInput
IncrementalTransformContext(is_incremental, ...) 增加了增量计算功能的TransformContext
IncrementalTransformInput(tinput[, ...]) 增加了增量计算功能的TransformInput
IncrementalTransformOutput(toutput[, ...]) 增加了增量计算功能的TransformOutput
Input([alias, branch, description, ...]) 转换(transform)数据集输入的规范。
InputSet([aliases, description]) 转换(transform)输入列表的规范。
IntegerParam(default, *[, description]) 用于作为转换(transform)输入的ParameterSpec定义的规范。
LightweightContext() 一个上下文对象,可选择性地注入到轻量级转换(transform)的计算函数中。
LightweightInput(alias, rid[, branch]) 在运行时传入ContainerTransform对象的输入对象。
LightweightInputParam() 与轻量级单节点转换(transform)兼容的输入参数的基类型。
LightweightOutput(alias, rid[, branch]) 在运行时传入用户代码的输出对象。
LightweightOutputParam() 与轻量级单节点转换(transform)兼容的输出参数的基类型。
Markings(marking_ids, on_branches) 用于停止从输入传播的标记(marking)的规范。
OrgMarkings(marking_ids, on_branches) 用于输出上不再需要的标记(marking)的规范。
Output([alias, sever_permissions, ...]) 转换(transform)输出的规范。
OutputSet([aliases, sever_permissions, ...]) 转换(transform)输出列表的规范。
Param([description]) 转换(transform)计算函数所接受的任何参数的基类。
ParamContext(foundry_connector, input_specs, ...) 注入到参数instance方法中的上下文对象。
ParamValueInput(value) 参数规范(parameter spec)值的包装器。
Pipeline() 用于对一组Transform对象进行分组的对象。
StringParam(default, *[, description, ...]) 用于作为转换(transform)输入的ParameterSpec定义的规范。
TableTransformInput(rid, branch, table_dfreader) 在运行时传入转换(transform)对象的输入对象,用于虚拟表(virtual table)输入。
Transform(compute_func[, inputs, outputs, ...]) 一个可调用对象,描述 Spark 计算的单个步骤。
TransformContext(foundry_connector[, ...]) 一个上下文对象,可选择性地注入到转换(transform)的计算函数中。
TransformInput(rid, branch, txrange, ...[, ...]) 在运行时传入Transform对象的输入对象。
TransformOutput(rid, branch, txrid, ...[, mode]) 在运行时传入Transform对象的输出对象。
transform(**kwargs) 将计算函数包装为Transform对象。

异常

名称 描述
LightweightException 轻量级兼容性检查的基异常。
LightweightNotImplementedError(message) 针对不支持功能的轻量级特定NotImplementedError
LightweightTypeError(message) 轻量级兼容性检查中的类型错误异常。
LightweightValueError(message) 轻量级兼容性检查中的值错误异常。