transforms.api¶
The Transforms Python API provides classes and decorators for constructing a Pipeline.
Functions¶
| Name | Description |
|---|---|
configure([profile, allowed_run_duration, ...]) |
A decorator that modifies the configuration of a Spark transform. |
incremental([require_incremental, ...]) |
A decorator to convert inputs and outputs into their transforms.api.incremental counterparts. |
lightweight([_maybe_function, cpu_cores, ...]) |
|
transform_df(output, **inputs) |
Register the wrapped compute function as a DataFrame transform. |
transform_pandas(output, **inputs) |
Register the wrapped compute function as a pandas transform. |
transform_polars(output, **inputs) |
Register the wrapped compute function as a Polars transform. |
Classes¶
| Name | Description |
|---|---|
BooleanParam(default, *[, description]) |
Specification for the ParameterSpec definition used as an input to a transform. |
Check(expectation, name[, on_error, description]) |
Wraps up an expectation such that it can be registered with Data Health. |
ComputeBackend(*values) |
Enum class for representing the different compute backends for use in configure(). |
ContainerTransform(transform, *[, ...]) |
A callable object that describes a single step of a lightweight, single-node computation. |
ContainerTransformsConfiguration(transform, *) |
A callable object that describes a single step of a lightweight, single-node computation. |
Dataset(alias) |
A class representing the files backing a Foundry dataset view. |
FileStatus(path, size, modified) |
A collections.namedtuple capturing details about a FoundryFS file in Spark transforms. |
FileSystem(foundry_fs[, read_only]) |
A filesystem object for reading and writing raw dataset files in Spark transforms. |
FloatParam(default, *[, description]) |
Specification for the ParameterSpec definition used as an input to a transform. |
FoundryDataSidecarFile(param, path, ...) |
A file object for reading and writing raw dataset files in lightweight, single-node transforms. |
FoundryDataSidecarFileSystem(param[, ...]) |
A file system for reading and writing raw dataset files in lightweight, single-node transforms. |
FoundryInputParam(aliases[, branch, type, ...]) |
A base class for transforms input parameters. |
FoundryOutputParam(aliases[, type, ...]) |
A base class for transforms output parameters. |
IncrementalLightweightInput(alias, rid[, branch]) |
The input object passed into incremental ContainerTransform objects at runtime. |
IncrementalLightweightOutput(alias, rid[, ...]) |
The output object passed into user code at runtime for incremental ContainerTransform objects. |
IncrementalTableTransformInput(table_tinput, ...) |
TableTransformInput with added functionality for incremental computation. |
IncrementalTransformContext(is_incremental, ...) |
TransformContext with added functionality for incremental computation. |
IncrementalTransformInput(tinput[, ...]) |
TransformInput with added functionality for incremental computation. |
IncrementalTransformOutput(toutput[, ...]) |
TransformOutput with added functionality for incremental computation. |
Input([alias, branch, description, ...]) |
Specification for a transform dataset input. |
InputSet([aliases, description]) |
Specification for a list of transform inputs. |
IntegerParam(default, *[, description]) |
Specification for a ParameterSpec definition used as an input to a transform. |
LightweightContext() |
A context object that can optionally be injected into the compute function of a lightweight transform. |
LightweightInput(alias, rid[, branch]) |
The input object passed into ContainerTransform objects at runtime. |
LightweightInputParam() |
Base type for input parameters compatible with lightweight, single node transforms. |
LightweightOutput(alias, rid[, branch]) |
The output object passed to user code at runtime. |
LightweightOutputParam() |
Base type for output parameters compatible with lightweight, single node transforms. |
Markings(marking_ids, on_branches) |
Specification for a marking that stops propagating from input. |
OrgMarkings(marking_ids, on_branches) |
Specification for a marking that is no longer required on the output. |
Output([alias, sever_permissions, ...]) |
Specification for a transform output. |
OutputSet([aliases, sever_permissions, ...]) |
Specification for a list of transform outputs. |
Param([description]) |
Base class for any parameter taken by the transform compute function. |
ParamContext(foundry_connector, input_specs, ...) |
A context object injected in the instance method of a parameter. |
ParamValueInput(value) |
A wrapper around the value of a parameter spec. |
Pipeline() |
An object for grouping a collection of Transform objects. |
StringParam(default, *[, description, ...]) |
Specification for the ParameterSpec definition used as an input to a transform. |
TableTransformInput(rid, branch, table_dfreader) |
The input object passed into transform objects at runtime for virtual table inputs. |
Transform(compute_func[, inputs, outputs, ...]) |
A callable object that describes a single step of a Spark computation. |
TransformContext(foundry_connector[, ...]) |
A context object that can optionally be injected into the compute function of a transform. |
TransformInput(rid, branch, txrange, ...[, ...]) |
The input object passed into Transform objects at runtime. |
TransformOutput(rid, branch, txrid, ...[, mode]) |
The output object passed into Transform objects at runtime. |
transform(**kwargs) |
Wrap a compute function as a Transform object. |
Exceptions¶
| Name | Description |
|---|---|
LightweightException |
Base exception for lightweight compatibility checks. |
LightweightNotImplementedError(message) |
Lightweight-specific NotImplementedError ↗ for unsupported features. |
LightweightTypeError(message) |
Exception for type errors in lightweight compatibility checks. |
LightweightValueError(message) |
Exception for value errors in lightweight compatibility checks. |
中文翻译¶
transforms.api¶
Transforms Python API 提供了用于构建Pipeline的类和装饰器。
函数¶
| 名称 | 描述 |
|---|---|
configure([profile, allowed_run_duration, ...]) |
一个装饰器,用于修改 Spark 转换(transform)的配置。 |
incremental([require_incremental, ...]) |
一个装饰器,用于将输入和输出转换为其对应的transforms.api.incremental类型。 |
lightweight([_maybe_function, cpu_cores, ...]) |
|
transform_df(output, **inputs) |
将包装的计算函数注册为 DataFrame 转换(transform)。 |
transform_pandas(output, **inputs) |
将包装的计算函数注册为 pandas 转换(transform)。 |
transform_polars(output, **inputs) |
将包装的计算函数注册为 Polars 转换(transform)。 |
类¶
| 名称 | 描述 |
|---|---|
BooleanParam(default, *[, description]) |
用于作为转换(transform)输入的ParameterSpec定义的规范。 |
Check(expectation, name[, on_error, description]) |
封装一个期望值,使其能够注册到数据健康(Data Health)中。 |
ComputeBackend(*values) |
枚举类,用于表示在configure()中使用的不同计算后端。 |
ContainerTransform(transform, *[, ...]) |
一个可调用对象,描述轻量级单节点计算的单个步骤。 |
ContainerTransformsConfiguration(transform, *) |
一个可调用对象,描述轻量级单节点计算的单个步骤。 |
Dataset(alias) |
表示 Foundry 数据集视图(dataset view)所对应文件的类。 |
FileStatus(path, size, modified) |
一个collections.namedtuple,捕获 Spark 转换(transform)中FoundryFS文件的详细信息。 |
FileSystem(foundry_fs[, read_only]) |
用于在 Spark 转换(transform)中读写原始数据集文件的文件系统对象。 |
FloatParam(default, *[, description]) |
用于作为转换(transform)输入的ParameterSpec定义的规范。 |
FoundryDataSidecarFile(param, path, ...) |
用于在轻量级单节点转换(transform)中读写原始数据集文件的文件对象。 |
FoundryDataSidecarFileSystem(param[, ...]) |
用于在轻量级单节点转换(transform)中读写原始数据集文件的文件系统。 |
FoundryInputParam(aliases[, branch, type, ...]) |
转换(transform)输入参数的基类。 |
FoundryOutputParam(aliases[, type, ...]) |
转换(transform)输出参数的基类。 |
IncrementalLightweightInput(alias, rid[, branch]) |
在运行时传入增量式ContainerTransform对象的输入对象。 |
IncrementalLightweightOutput(alias, rid[, ...]) |
在运行时传入用户代码的输出对象,用于增量式ContainerTransform对象。 |
IncrementalTableTransformInput(table_tinput, ...) |
增加了增量计算功能的TableTransformInput。 |
IncrementalTransformContext(is_incremental, ...) |
增加了增量计算功能的TransformContext。 |
IncrementalTransformInput(tinput[, ...]) |
增加了增量计算功能的TransformInput。 |
IncrementalTransformOutput(toutput[, ...]) |
增加了增量计算功能的TransformOutput。 |
Input([alias, branch, description, ...]) |
转换(transform)数据集输入的规范。 |
InputSet([aliases, description]) |
转换(transform)输入列表的规范。 |
IntegerParam(default, *[, description]) |
用于作为转换(transform)输入的ParameterSpec定义的规范。 |
LightweightContext() |
一个上下文对象,可选择性地注入到轻量级转换(transform)的计算函数中。 |
LightweightInput(alias, rid[, branch]) |
在运行时传入ContainerTransform对象的输入对象。 |
LightweightInputParam() |
与轻量级单节点转换(transform)兼容的输入参数的基类型。 |
LightweightOutput(alias, rid[, branch]) |
在运行时传入用户代码的输出对象。 |
LightweightOutputParam() |
与轻量级单节点转换(transform)兼容的输出参数的基类型。 |
Markings(marking_ids, on_branches) |
用于停止从输入传播的标记(marking)的规范。 |
OrgMarkings(marking_ids, on_branches) |
用于输出上不再需要的标记(marking)的规范。 |
Output([alias, sever_permissions, ...]) |
转换(transform)输出的规范。 |
OutputSet([aliases, sever_permissions, ...]) |
转换(transform)输出列表的规范。 |
Param([description]) |
转换(transform)计算函数所接受的任何参数的基类。 |
ParamContext(foundry_connector, input_specs, ...) |
注入到参数instance方法中的上下文对象。 |
ParamValueInput(value) |
参数规范(parameter spec)值的包装器。 |
Pipeline() |
用于对一组Transform对象进行分组的对象。 |
StringParam(default, *[, description, ...]) |
用于作为转换(transform)输入的ParameterSpec定义的规范。 |
TableTransformInput(rid, branch, table_dfreader) |
在运行时传入转换(transform)对象的输入对象,用于虚拟表(virtual table)输入。 |
Transform(compute_func[, inputs, outputs, ...]) |
一个可调用对象,描述 Spark 计算的单个步骤。 |
TransformContext(foundry_connector[, ...]) |
一个上下文对象,可选择性地注入到转换(transform)的计算函数中。 |
TransformInput(rid, branch, txrange, ...[, ...]) |
在运行时传入Transform对象的输入对象。 |
TransformOutput(rid, branch, txrid, ...[, mode]) |
在运行时传入Transform对象的输出对象。 |
transform(**kwargs) |
将计算函数包装为Transform对象。 |
异常¶
| 名称 | 描述 |
|---|---|
LightweightException |
轻量级兼容性检查的基异常。 |
LightweightNotImplementedError(message) |
针对不支持功能的轻量级特定NotImplementedError ↗。 |
LightweightTypeError(message) |
轻量级兼容性检查中的类型错误异常。 |
LightweightValueError(message) |
轻量级兼容性检查中的值错误异常。 |