Strings(字符串(Strings))¶
Strings refer to text data.
from pyspark.sql import functions as F
Converting between cases¶
F.initcap(col)F.lower(col)F.upper(col)
Concatenating, splitting¶
F.concat(*cols)F.concat_ws(sep, *cols)F.split(str, pattern)
Substrings¶
F.instr(str, substr)F.locate(substr, str, pos=1)F.substring(str, pos, len)F.substring_index(str, delim, count)
Trimming, padding¶
F.lpad(col, len, pad)F.ltrim(col)F.rpad(col, len, pad)F.rtrim(col)F.trim(col)
Regex¶
F.regexp_extract(str, pattern, idx)F.regexp_replace(str, pattern, replacement)
Misc¶
F.ascii(col)F.base64(col)F.bin(col)F.conv(col, fromBase, toBase)F.decode(col, charset)F.encode(col, charset)F.format_number(col, d)F.format_string(format, *cols)F.hex(col)F.length(col)F.levenshtein(left, right)F.repeat(col, n)F.reverse(col)F.translate(srcCol, matching, replace)F.unbase64(col)F.unhex(col)
中文翻译¶
字符串(Strings)¶
字符串(Strings)指文本数据。
from pyspark.sql import functions as F
大小写转换¶
F.initcap(col)— 首字母大写F.lower(col)— 转换为小写F.upper(col)— 转换为大写
拼接与拆分¶
F.concat(*cols)— 拼接F.concat_ws(sep, *cols)— 带分隔符拼接F.split(str, pattern)— 拆分
子字符串¶
F.instr(str, substr)— 查找子串位置F.locate(substr, str, pos=1)— 定位子串F.substring(str, pos, len)— 截取子串F.substring_index(str, delim, count)— 按分隔符截取
修剪与填充¶
F.lpad(col, len, pad)— 左填充F.ltrim(col)— 去除左侧空格F.rpad(col, len, pad)— 右填充F.rtrim(col)— 去除右侧空格F.trim(col)— 去除两侧空格
正则表达式(Regex)¶
F.regexp_extract(str, pattern, idx)— 正则提取F.regexp_replace(str, pattern, replacement)— 正则替换
其他函数(Misc)¶
F.ascii(col)— 获取ASCII码F.base64(col)— Base64编码F.bin(col)— 转换为二进制F.conv(col, fromBase, toBase)— 进制转换F.decode(col, charset)— 解码F.encode(col, charset)— 编码F.format_number(col, d)— 数字格式化F.format_string(format, *cols)— 字符串格式化F.hex(col)— 转换为十六进制F.length(col)— 获取长度F.levenshtein(left, right)— 编辑距离F.repeat(col, n)— 重复字符串F.reverse(col)— 反转字符串F.translate(srcCol, matching, replace)— 字符替换F.unbase64(col)— Base64解码F.unhex(col)— 十六进制解码