跳转至

Text segmentation(文本分割(Text segmentation))

Supported in: Batch, Faster, Streaming

Extract a series of text segments using sliding window segmentation.

Expression categories: String

Declared arguments

  • Expression: The body of text that is to be segmented.
    Expression\
  • Length: The length in terms of words for the segments that the text will be broken into.
    Expression\
  • optional Overflow: The number of words a segment can share with another segment.
    Expression\

Output type: Array\

Examples

Example 1: Base case

Description: This test shows the abilty of the tranform to properly segment asmall set of text where the end will be its own segment as well.

Argument values:

  • Expression: string
  • Length: 3
  • Overflow: 1
string Output
hello world this is a test string [ hello world this, this is a, a test string, string ]

Example 2: Base case

Description: Test with negative overflow.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
She sells sea shells by 2 -1 [ She sells, shells by ]

Example 3: Base case

Description: A larger test with overflow and a smaller segment at the end.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
hello world this is a larger test with overlap, the nature of the human spirit is strange as such i ... 10 3 [ hello world this is a larger test with overlap, the, with overlap, the nature of the human spirit ...

Example 4: Base case

Description: Test a string where overflow is set to 0and the last segment is smaller than a full length.

Argument values:

  • Expression: string
  • Length: 3
  • Overflow: null
string Output
hello world this is a test string [ hello world this, is a test, string ]

Example 5: Base case

Description: Test with no overflow where the segments are perfectly divided by length.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
hello world this is a test string without overlap 3 0 [ hello world this, is a test, string without overlap ]

Example 6: Null case

Description: Test with no overflow where the segments are perfectly divided by length.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
null null null null

Example 7: Null case

Description: Test with no overflow where the segments are perfectly divided by length.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
null 1 null null

Example 8: Null case

Description: Test with no overflow where the segments are perfectly divided by length.

Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
string length overflow Output
Hello world null null null


中文翻译


文本分割(Text segmentation)

支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)

使用滑动窗口分割(sliding window segmentation)提取一系列文本片段。

表达式类别: 字符串(String)

声明参数

  • 表达式(Expression): 待分割的文本主体。
    Expression\
  • 长度(Length): 文本分割后每个片段的单词数量。
    Expression\
  • 可选 重叠量(Overflow): 一个片段可与另一片段共享的单词数量。
    Expression\

输出类型: Array\

示例

示例 1:基础情况

描述: 本测试展示该转换对少量文本进行正确分割的能力,其中末尾部分也将独立成为一个片段。

参数值:

  • 表达式: string
  • 长度: 3
  • 重叠量: 1
string 输出
hello world this is a test string [ hello world this, this is a, a test string, string ]

示例 2:基础情况

描述: 测试负重叠量。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
She sells sea shells by 2 -1 [ She sells, shells by ]

示例 3:基础情况

描述: 包含重叠量且末尾片段较小的较大规模测试。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
hello world this is a larger test with overlap, the nature of the human spirit is strange as such i ... 10 3 [ hello world this is a larger test with overlap, the, with overlap, the nature of the human spirit ...

示例 4:基础情况

描述: 测试重叠量设为 0 且最后一个片段小于完整长度的情况。

参数值:

  • 表达式: string
  • 长度: 3
  • 重叠量: null
string 输出
hello world this is a test string [ hello world this, is a test, string ]

示例 5:基础情况

描述: 测试无重叠量且片段按长度完美分割的情况。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
hello world this is a test string without overlap 3 0 [ hello world this, is a test, string without overlap ]

示例 6:空值情况

描述: 测试无重叠量且片段按长度完美分割的情况。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
null null null null

示例 7:空值情况

描述: 测试无重叠量且片段按长度完美分割的情况。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
null 1 null null

示例 8:空值情况

描述: 测试无重叠量且片段按长度完美分割的情况。

参数值:

  • 表达式: string
  • 长度: length
  • 重叠量: overflow
string length overflow 输出
Hello world null null null