跳转至

Other(其他)

Collections

  • array(*cols)
  • array_contains(col, value)
  • size(col)
  • sort_array(col, asc=True)
  • struct(*cols)

Sorting

  • asc(col)
  • desc(col)

Binary

  • bitwiseNOT(col)
  • shiftLeft(col, numBits)
  • shiftRight(col, numBits)
  • shiftRightUnsigned(col, numBits)

Dealing with null values

  • coalesce(*cols)
  • isnan(col)
  • isnull(col)

Columns

  • col(col) or column(col)
  • create_map(*cols)
  • explode(col)
  • expr(str)
  • hash(*cols)
  • input_file_name()
  • posexplode(col)
  • sha1(col)
  • sha2(col, numBits)
  • soundex(col)
  • spark_partition_id()

JSON

  • from_json(col, schema, options={})
  • get_json_object(col, path)
  • json_tuple(col, *fields)
  • to_json(col, options={})

Checkpoints

  • checkpoint(eager=True)
  • localCheckpoint(eager=True)

:::callout{theme="neutral"} The checkpoint() function is used to temporarily store a DataFrame on disk, whereas localCheckpoint() stores them in executor memory. Use the eager parameter value to set whether or not the DataFrame is checkpointed immediately (default value is True). :::


中文翻译


其他

集合(Collections)

  • array(*cols)
  • array_contains(col, value)
  • size(col)
  • sort_array(col, asc=True)
  • struct(*cols)

排序(Sorting)

  • asc(col)
  • desc(col)

二进制(Binary)

  • bitwiseNOT(col)
  • shiftLeft(col, numBits)
  • shiftRight(col, numBits)
  • shiftRightUnsigned(col, numBits)

处理空值(Dealing with null values)

  • coalesce(*cols)
  • isnan(col)
  • isnull(col)

列(Columns)

  • col(col) 或 column(col)
  • create_map(*cols)
  • explode(col)
  • expr(str)
  • hash(*cols)
  • input_file_name()
  • posexplode(col)
  • sha1(col)
  • sha2(col, numBits)
  • soundex(col)
  • spark_partition_id()

JSON

  • from_json(col, schema, options={})
  • get_json_object(col, path)
  • json_tuple(col, *fields)
  • to_json(col, options={})

检查点(Checkpoints)

  • checkpoint(eager=True)
  • localCheckpoint(eager=True)

:::callout{theme="neutral"} checkpoint() 函数用于将 DataFrame 临时存储到磁盘,而 localCheckpoint() 则将其存储在执行器内存中。使用 eager 参数值来设置是否立即对 DataFrame 执行检查点操作(默认值为 True)。 :::