Output column metadata（输出列元数据）¶

You can read and write the column descriptions and column typeclasses for your output datasets in Code Repository Transforms.

Updating column descriptions in Code Repository Transforms¶

You can add output column descriptions to your output datasets by providing the optional column_descriptions argument to the write_dataframe() function of the TransformOutput.

This argument should be a dict ↗ with keys of column names and values of column descriptions. Column descriptions are limited up to 800 characters in length.
The code will automatically compute the intersection of the column names available on your DataFrame ↗ and the keys in the dict ↗ you provide, so it will not try to put descriptions on columns that don't exist.

Example: Write column descriptions in Code Repository Transforms¶

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("/my/output"),
    my_input=Input("/my/input"),
)
def my_compute_function(my_input, my_output):
    my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions={
            "col_1": "col 1 description"
        }
    )

Column typeclasses¶

The column_typeclasses property gives back a structured Dict<str, List<Dict<str, str>>>, which maps column names to their column typeclasses.

Each typeclass in the List is a Dict[str, str] object.
This Dict object must only use the keys "name" and "kind". Each of these keys maps to the corresponding string the user wants.

An example column_typeclasses value would be {"my_column": [{"name": "my_typeclass_name", "kind": "my_typeclass_kind"}]}.

Example: Read and write column descriptions and typeclasses in Code Repository Transforms¶

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("ri.foundry.main.dataset.my-output-dataset"),
    my_input=Input("ri.foundry.main.dataset.my-input-dataset"),
)
def my_compute_function(my_input, my_output):
    recent = my_input.dataframe().limit(10)

    existing_typeclasses = my_input.column_typeclasses
    existing_descriptions = my_input.column_descriptions

    my_output.write_dataframe(
        recent,
        column_descriptions=existing_descriptions,
        column_typeclasses=existing_typeclasses
    )

中文翻译¶

输出列元数据¶

您可以在代码仓库转换(Code Repository Transforms)中读取和写入输出数据集的列描述和列类型类。

在代码仓库转换中更新列描述¶

您可以通过为 write_dataframe() 函数提供可选的 column_descriptions 参数，为输出数据集添加列描述。该函数属于 TransformOutput。

该参数应为字典(dict) ↗，键为列名，值为列描述。列描述的长度限制为最多800个字符。
代码会自动计算您的 DataFrame ↗ 中可用的列名与您提供的字典(dict) ↗中键的交集，因此不会尝试为不存在的列添加描述。

示例：在代码仓库转换中写入列描述¶

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("/my/output"),
    my_input=Input("/my/input"),
)
def my_compute_function(my_input, my_output):
    my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions={
            "col_1": "col 1 description"
        }
    )

列类型类¶

column_typeclasses 属性返回一个结构化的 Dict<str, List<Dict<str, str>>>，将列名映射到其列类型类。

List 中的每个类型类都是一个 Dict[str, str] 对象。
该 Dict 对象只能使用键 "name" 和 "kind"。每个键都映射到用户想要对应的字符串。

一个 column_typeclasses 值的示例为 {"my_column": [{"name": "my_typeclass_name", "kind": "my_typeclass_kind"}]}。

示例：在代码仓库转换中读取和写入列描述与类型类¶

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("ri.foundry.main.dataset.my-output-dataset"),
    my_input=Input("ri.foundry.main.dataset.my-input-dataset"),
)
def my_compute_function(my_input, my_output):
    recent = my_input.dataframe().limit(10)

    existing_typeclasses = my_input.column_typeclasses
    existing_descriptions = my_input.column_descriptions

    my_output.write_dataframe(
        recent,
        column_descriptions=existing_descriptions,
        column_typeclasses=existing_typeclasses
    )