Token set ratio(令牌集比率(Token set ratio))¶
Supported in: Batch, Streaming
Compute the token set ratio between two strings. Token set ratio is a metric describing how similar two strings are, and will return a value between 0 and 1, where 0 means that there are no similarities between the two strings and 1 means that they are the same (or one is a substring of the other).
Expression categories: Distance measurement, String
Declared arguments¶
- Ignore case: Do you want to ignore case when comparing the left and right strings?
Literal\ - Left: Left string to compare.
Expression\ - Right: Right string to compare.
Expression\
Output type: Double
Examples¶
Example 1: Base case¶
Argument values:
- Ignore case: false
- Left:
left - Right:
right
| left | right | Output |
|---|---|---|
| hello world | world hello | 1.0 |
| Hello | hello world | 0.5 |
| hello hello WorlD | hello world | 0.8181818181818181 |
| hello | farewell | 0.46153846153846156 |
| empty string | empty string | 1.0 |
Example 2: Base case¶
Description: By setting ignore case to true, letters of different case are treated as equal.
Argument values:
- Ignore case: true
- Left:
left - Right:
right
| left | right | Output |
|---|---|---|
| Hello | hello world | 1.0 |
| hello hello WorlD | hello world | 1.0 |
| hello | FAREWELL | 0.46153846153846156 |
Example 3: Null case¶
Argument values:
- Ignore case: false
- Left:
left - Right:
right
| left | right | Output |
|---|---|---|
| hello | null | null |
| null | hello | null |
| null | null | null |
中文翻译¶
令牌集比率(Token set ratio)¶
支持:批处理(Batch)、流处理(Streaming)
计算两个字符串之间的令牌集比率。令牌集比率是一种描述两个字符串相似程度的指标,返回值介于0到1之间,其中0表示两个字符串之间没有相似性,1表示它们完全相同(或者一个是另一个的子串)。
表达式类别: 距离测量(Distance measurement)、字符串(String)
声明的参数¶
- 忽略大小写(Ignore case): 比较左右字符串时是否忽略大小写?
字面量\ - 左侧(Left): 要比较的左侧字符串。
表达式\ - 右侧(Right): 要比较的右侧字符串。
表达式\
输出类型: Double
示例¶
示例1:基本情况¶
参数值:
- 忽略大小写: false
- 左侧:
left - 右侧:
right
| left | right | 输出 |
|---|---|---|
| hello world | world hello | 1.0 |
| Hello | hello world | 0.5 |
| hello hello WorlD | hello world | 0.8181818181818181 |
| hello | farewell | 0.46153846153846156 |
| 空字符串 | 空字符串 | 1.0 |
示例2:基本情况¶
描述: 通过将忽略大小写设置为true,不同大小写的字母将被视为相同。
参数值:
- 忽略大小写: true
- 左侧:
left - 右侧:
right
| left | right | 输出 |
|---|---|---|
| Hello | hello world | 1.0 |
| hello hello WorlD | hello world | 1.0 |
| hello | FAREWELL | 0.46153846153846156 |
示例3:空值情况¶
参数值:
- 忽略大小写: false
- 左侧:
left - 右侧:
right
| left | right | 输出 |
|---|---|---|
| hello | null | null |
| null | hello | null |
| null | null | null |