跳转至

Edit distance(编辑距离(Edit distance))

Supported in: Batch, Faster, Streaming

Compute the edit distance between two strings. Supports Levenshtein, indel, and Damerau-Levenshtein distance.

Expression categories: Distance measurement, String

Declared arguments

  • Distance function: Distance function used to calculate the edit distance between the two strings.
    Enum\
  • Ignore case: Do you want to ignore case when comparing the left and right strings?
    Literal\
  • Left: Left string to compare.
    Expression\
  • Right: Right string to compare.
    Expression\
  • optional Normalize distance: Do you want to normalize the distance to a value between 0 and 1, where 0 means no difference between strings and 1 means no similarity?
    Literal\

Output type: Double | Integer

Examples

Example 1: Base case

Description: String edit distance calculated using Levenshtein distance

Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
left right Output
hello hello 0
hallo hello 1
hlelo hello 2
hello hEllO 2
hello hello, world! 8
hello farewell 6

Example 2: Base case

Description: By setting ignore case to true, letters of different case are treated as equal. Here calculated using Damerau-Levenshtein distance.

Argument values:

  • Distance function: damerau_levenshtein
  • Ignore case: true
  • Left: left
  • Right: right
  • Normalize distance: false
left right Output
hello hello 0
hallo hello 1
hlelo hello 1
hello hEllO 0
hello hello, world! 8
hello farewell 6

Example 3: Base case

Description: By setting normalize to true, the edit distance is normalized to a value between 0 and 1. Here calculated using indel distance.

Argument values:

  • Distance function: indel
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: true
left right Output
hello hello 0.0
hallo hello 0.2
hlelo hello 0.2
hello hEllO 0.4
hello hello, world! 0.4444444444444444
hello farewell 0.5384615384615384

Example 4: Null case

Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
left right Output
hello null null
null hello null
null null null


中文翻译


编辑距离(Edit distance)

支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)

计算两个字符串之间的编辑距离。支持莱文斯坦距离(Levenshtein distance)、插入删除距离(indel distance)和达默劳-莱文斯坦距离(Damerau-Levenshtein distance)。

表达式类别: 距离测量(Distance measurement)、字符串(String)

声明的参数

  • 距离函数(Distance function): 用于计算两个字符串之间编辑距离的函数。
    枚举\
  • 忽略大小写(Ignore case): 比较左右字符串时是否忽略大小写?
    字面量\
  • 左侧(Left): 要比较的左侧字符串。
    表达式\
  • 右侧(Right): 要比较的右侧字符串。
    表达式\
  • 可选 归一化距离(Normalize distance): 是否将距离归一化为 0 到 1 之间的值,其中 0 表示字符串之间无差异,1 表示无相似性?
    字面量\

输出类型: Double | Integer

示例

示例 1:基本情况

描述: 使用莱文斯坦距离(Levenshtein distance)计算字符串编辑距离

参数值:

  • 距离函数: levenshtein
  • 忽略大小写: false
  • 左侧: left
  • 右侧: right
  • 归一化距离: false
left right 输出
hello hello 0
hallo hello 1
hlelo hello 2
hello hEllO 2
hello hello, world! 8
hello farewell 6

示例 2:基本情况

描述: 将忽略大小写设置为 true 后,不同大小写的字母被视为相等。此处使用达默劳-莱文斯坦距离(Damerau-Levenshtein distance)计算。

参数值:

  • 距离函数: damerau_levenshtein
  • 忽略大小写: true
  • 左侧: left
  • 右侧: right
  • 归一化距离: false
left right 输出
hello hello 0
hallo hello 1
hlelo hello 1
hello hEllO 0
hello hello, world! 8
hello farewell 6

示例 3:基本情况

描述: 将归一化设置为 true 后,编辑距离被归一化为 0 到 1 之间的值。此处使用插入删除距离(indel distance)计算。

参数值:

  • 距离函数: indel
  • 忽略大小写: false
  • 左侧: left
  • 右侧: right
  • 归一化距离: true
left right 输出
hello hello 0.0
hallo hello 0.2
hlelo hello 0.2
hello hEllO 0.4
hello hello, world! 0.4444444444444444
hello farewell 0.5384615384615384

示例 4:空值情况

参数值:

  • 距离函数: levenshtein
  • 忽略大小写: false
  • 左侧: left
  • 右侧: right
  • 归一化距离: false
left right 输出
hello null null
null hello null
null null null