Edit distance(编辑距离(Edit distance))¶
Supported in: Batch, Faster, Streaming
Compute the edit distance between two strings. Supports Levenshtein, indel, and Damerau-Levenshtein distance.
Expression categories: Distance measurement, String
Declared arguments¶
- Distance function: Distance function used to calculate the edit distance between the two strings.
Enum\ - Ignore case: Do you want to ignore case when comparing the left and right strings?
Literal\ - Left: Left string to compare.
Expression\ - Right: Right string to compare.
Expression\ - optional Normalize distance: Do you want to normalize the distance to a value between 0 and 1, where 0 means no difference between strings and 1 means no similarity?
Literal\
Output type: Double | Integer
Examples¶
Example 1: Base case¶
Description: String edit distance calculated using Levenshtein distance
Argument values:
- Distance function:
levenshtein - Ignore case: false
- Left:
left - Right:
right - Normalize distance: false
| left | right | Output |
|---|---|---|
| hello | hello | 0 |
| hallo | hello | 1 |
| hlelo | hello | 2 |
| hello | hEllO | 2 |
| hello | hello, world! | 8 |
| hello | farewell | 6 |
Example 2: Base case¶
Description: By setting ignore case to true, letters of different case are treated as equal. Here calculated using Damerau-Levenshtein distance.
Argument values:
- Distance function:
damerau_levenshtein - Ignore case: true
- Left:
left - Right:
right - Normalize distance: false
| left | right | Output |
|---|---|---|
| hello | hello | 0 |
| hallo | hello | 1 |
| hlelo | hello | 1 |
| hello | hEllO | 0 |
| hello | hello, world! | 8 |
| hello | farewell | 6 |
Example 3: Base case¶
Description: By setting normalize to true, the edit distance is normalized to a value between 0 and 1. Here calculated using indel distance.
Argument values:
- Distance function:
indel - Ignore case: false
- Left:
left - Right:
right - Normalize distance: true
| left | right | Output |
|---|---|---|
| hello | hello | 0.0 |
| hallo | hello | 0.2 |
| hlelo | hello | 0.2 |
| hello | hEllO | 0.4 |
| hello | hello, world! | 0.4444444444444444 |
| hello | farewell | 0.5384615384615384 |
Example 4: Null case¶
Argument values:
- Distance function:
levenshtein - Ignore case: false
- Left:
left - Right:
right - Normalize distance: false
| left | right | Output |
|---|---|---|
| hello | null | null |
| null | hello | null |
| null | null | null |
中文翻译¶
编辑距离(Edit distance)¶
支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)
计算两个字符串之间的编辑距离。支持莱文斯坦距离(Levenshtein distance)、插入删除距离(indel distance)和达默劳-莱文斯坦距离(Damerau-Levenshtein distance)。
表达式类别: 距离测量(Distance measurement)、字符串(String)
声明的参数¶
- 距离函数(Distance function): 用于计算两个字符串之间编辑距离的函数。
枚举\ - 忽略大小写(Ignore case): 比较左右字符串时是否忽略大小写?
字面量\ - 左侧(Left): 要比较的左侧字符串。
表达式\ - 右侧(Right): 要比较的右侧字符串。
表达式\ - 可选 归一化距离(Normalize distance): 是否将距离归一化为 0 到 1 之间的值,其中 0 表示字符串之间无差异,1 表示无相似性?
字面量\
输出类型: Double | Integer
示例¶
示例 1:基本情况¶
描述: 使用莱文斯坦距离(Levenshtein distance)计算字符串编辑距离
参数值:
- 距离函数:
levenshtein - 忽略大小写: false
- 左侧:
left - 右侧:
right - 归一化距离: false
| left | right | 输出 |
|---|---|---|
| hello | hello | 0 |
| hallo | hello | 1 |
| hlelo | hello | 2 |
| hello | hEllO | 2 |
| hello | hello, world! | 8 |
| hello | farewell | 6 |
示例 2:基本情况¶
描述: 将忽略大小写设置为 true 后,不同大小写的字母被视为相等。此处使用达默劳-莱文斯坦距离(Damerau-Levenshtein distance)计算。
参数值:
- 距离函数:
damerau_levenshtein - 忽略大小写: true
- 左侧:
left - 右侧:
right - 归一化距离: false
| left | right | 输出 |
|---|---|---|
| hello | hello | 0 |
| hallo | hello | 1 |
| hlelo | hello | 1 |
| hello | hEllO | 0 |
| hello | hello, world! | 8 |
| hello | farewell | 6 |
示例 3:基本情况¶
描述: 将归一化设置为 true 后,编辑距离被归一化为 0 到 1 之间的值。此处使用插入删除距离(indel distance)计算。
参数值:
- 距离函数:
indel - 忽略大小写: false
- 左侧:
left - 右侧:
right - 归一化距离: true
| left | right | 输出 |
|---|---|---|
| hello | hello | 0.0 |
| hallo | hello | 0.2 |
| hlelo | hello | 0.2 |
| hello | hEllO | 0.4 |
| hello | hello, world! | 0.4444444444444444 |
| hello | farewell | 0.5384615384615384 |
示例 4:空值情况¶
参数值:
- 距离函数:
levenshtein - 忽略大小写: false
- 左侧:
left - 右侧:
right - 归一化距离: false
| left | right | 输出 |
|---|---|---|
| hello | null | null |
| null | hello | null |
| null | null | null |