跳转至

Token set ratio(令牌集比率(Token set ratio))

Supported in: Batch, Streaming

Compute the token set ratio between two strings. Token set ratio is a metric describing how similar two strings are, and will return a value between 0 and 1, where 0 means that there are no similarities between the two strings and 1 means that they are the same (or one is a substring of the other).

Expression categories: Distance measurement, String

Declared arguments

  • Ignore case: Do you want to ignore case when comparing the left and right strings?
    Literal\
  • Left: Left string to compare.
    Expression\
  • Right: Right string to compare.
    Expression\

Output type: Double

Examples

Example 1: Base case

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
left right Output
hello world world hello 1.0
Hello hello world 0.5
hello hello WorlD hello world 0.8181818181818181
hello farewell 0.46153846153846156
empty string empty string 1.0

Example 2: Base case

Description: By setting ignore case to true, letters of different case are treated as equal.

Argument values:

  • Ignore case: true
  • Left: left
  • Right: right
left right Output
Hello hello world 1.0
hello hello WorlD hello world 1.0
hello FAREWELL 0.46153846153846156

Example 3: Null case

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
left right Output
hello null null
null hello null
null null null


中文翻译

令牌集比率(Token set ratio)

支持:批处理(Batch)、流处理(Streaming)

计算两个字符串之间的令牌集比率。令牌集比率是一种描述两个字符串相似程度的指标,返回值介于0到1之间,其中0表示两个字符串之间没有相似性,1表示它们完全相同(或者一个是另一个的子串)。

表达式类别: 距离测量(Distance measurement)、字符串(String)

声明的参数

  • 忽略大小写(Ignore case): 比较左右字符串时是否忽略大小写?
    字面量\
  • 左侧(Left): 要比较的左侧字符串。
    表达式\
  • 右侧(Right): 要比较的右侧字符串。
    表达式\

输出类型: Double

示例

示例1:基本情况

参数值:

  • 忽略大小写: false
  • 左侧: left
  • 右侧: right
left right 输出
hello world world hello 1.0
Hello hello world 0.5
hello hello WorlD hello world 0.8181818181818181
hello farewell 0.46153846153846156
空字符串 空字符串 1.0

示例2:基本情况

描述: 通过将忽略大小写设置为true,不同大小写的字母将被视为相同。

参数值:

  • 忽略大小写: true
  • 左侧: left
  • 右侧: right
left right 输出
Hello hello world 1.0
hello hello WorlD hello world 1.0
hello FAREWELL 0.46153846153846156

示例3:空值情况

参数值:

  • 忽略大小写: false
  • 左侧: left
  • 右侧: right
left right 输出
hello null null
null hello null
null null null