Other(其他)¶
Collections¶
array(*cols)array_contains(col, value)size(col)sort_array(col, asc=True)struct(*cols)
Sorting¶
asc(col)desc(col)
Binary¶
bitwiseNOT(col)shiftLeft(col, numBits)shiftRight(col, numBits)shiftRightUnsigned(col, numBits)
Dealing with null values¶
coalesce(*cols)isnan(col)isnull(col)
Columns¶
col(col) or column(col)create_map(*cols)explode(col)expr(str)hash(*cols)input_file_name()posexplode(col)sha1(col)sha2(col, numBits)soundex(col)spark_partition_id()
JSON¶
from_json(col, schema, options={})get_json_object(col, path)json_tuple(col, *fields)to_json(col, options={})
Checkpoints¶
checkpoint(eager=True)localCheckpoint(eager=True)
:::callout{theme="neutral"}
The checkpoint() function is used to temporarily store a DataFrame on disk, whereas localCheckpoint() stores them in executor memory. Use the eager parameter value to set whether or not the DataFrame is checkpointed immediately (default value is True).
:::
中文翻译¶
其他¶
集合(Collections)¶
array(*cols)array_contains(col, value)size(col)sort_array(col, asc=True)struct(*cols)
排序(Sorting)¶
asc(col)desc(col)
二进制(Binary)¶
bitwiseNOT(col)shiftLeft(col, numBits)shiftRight(col, numBits)shiftRightUnsigned(col, numBits)
处理空值(Dealing with null values)¶
coalesce(*cols)isnan(col)isnull(col)
列(Columns)¶
col(col) 或 column(col)create_map(*cols)explode(col)expr(str)hash(*cols)input_file_name()posexplode(col)sha1(col)sha2(col, numBits)soundex(col)spark_partition_id()
JSON¶
from_json(col, schema, options={})get_json_object(col, path)json_tuple(col, *fields)to_json(col, options={})
检查点(Checkpoints)¶
checkpoint(eager=True)localCheckpoint(eager=True)
:::callout{theme="neutral"}
checkpoint() 函数用于将 DataFrame 临时存储到磁盘,而 localCheckpoint() 则将其存储在执行器内存中。使用 eager 参数值来设置是否立即对 DataFrame 执行检查点操作(默认值为 True)。
:::