Text segmentation(文本分割(Text segmentation))¶
Supported in: Batch, Faster, Streaming
Extract a series of text segments using sliding window segmentation.
Expression categories: String
Declared arguments¶
- Expression: The body of text that is to be segmented.
Expression\ - Length: The length in terms of words for the segments that the text will be broken into.
Expression\ - optional Overflow: The number of words a segment can share with another segment.
Expression\
Output type: Array\
Examples¶
Example 1: Base case¶
Description: This test shows the abilty of the tranform to properly segment asmall set of text where the end will be its own segment as well.
Argument values:
- Expression:
string - Length: 3
- Overflow: 1
| string | Output |
|---|---|
| hello world this is a test string | [ hello world this, this is a, a test string, string ] |
Example 2: Base case¶
Description: Test with negative overflow.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| She sells sea shells by | 2 | -1 | [ She sells, shells by ] |
Example 3: Base case¶
Description: A larger test with overflow and a smaller segment at the end.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| hello world this is a larger test with overlap, the nature of the human spirit is strange as such i ... | 10 | 3 | [ hello world this is a larger test with overlap, the, with overlap, the nature of the human spirit ... |
Example 4: Base case¶
Description: Test a string where overflow is set to 0and the last segment is smaller than a full length.
Argument values:
- Expression:
string - Length: 3
- Overflow: null
| string | Output |
|---|---|
| hello world this is a test string | [ hello world this, is a test, string ] |
Example 5: Base case¶
Description: Test with no overflow where the segments are perfectly divided by length.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| hello world this is a test string without overlap | 3 | 0 | [ hello world this, is a test, string without overlap ] |
Example 6: Null case¶
Description: Test with no overflow where the segments are perfectly divided by length.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| null | null | null | null |
Example 7: Null case¶
Description: Test with no overflow where the segments are perfectly divided by length.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| null | 1 | null | null |
Example 8: Null case¶
Description: Test with no overflow where the segments are perfectly divided by length.
Argument values:
- Expression:
string - Length:
length - Overflow:
overflow
| string | length | overflow | Output |
|---|---|---|---|
| Hello world | null | null | null |
中文翻译¶
文本分割(Text segmentation)¶
支持:批处理(Batch)、快速处理(Faster)、流处理(Streaming)
使用滑动窗口分割(sliding window segmentation)提取一系列文本片段。
表达式类别: 字符串(String)
声明参数¶
- 表达式(Expression): 待分割的文本主体。
Expression\ - 长度(Length): 文本分割后每个片段的单词数量。
Expression\ - 可选 重叠量(Overflow): 一个片段可与另一片段共享的单词数量。
Expression\
输出类型: Array\
示例¶
示例 1:基础情况¶
描述: 本测试展示该转换对少量文本进行正确分割的能力,其中末尾部分也将独立成为一个片段。
参数值:
- 表达式:
string - 长度: 3
- 重叠量: 1
| string | 输出 |
|---|---|
| hello world this is a test string | [ hello world this, this is a, a test string, string ] |
示例 2:基础情况¶
描述: 测试负重叠量。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| She sells sea shells by | 2 | -1 | [ She sells, shells by ] |
示例 3:基础情况¶
描述: 包含重叠量且末尾片段较小的较大规模测试。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| hello world this is a larger test with overlap, the nature of the human spirit is strange as such i ... | 10 | 3 | [ hello world this is a larger test with overlap, the, with overlap, the nature of the human spirit ... |
示例 4:基础情况¶
描述: 测试重叠量设为 0 且最后一个片段小于完整长度的情况。
参数值:
- 表达式:
string - 长度: 3
- 重叠量: null
| string | 输出 |
|---|---|
| hello world this is a test string | [ hello world this, is a test, string ] |
示例 5:基础情况¶
描述: 测试无重叠量且片段按长度完美分割的情况。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| hello world this is a test string without overlap | 3 | 0 | [ hello world this, is a test, string without overlap ] |
示例 6:空值情况¶
描述: 测试无重叠量且片段按长度完美分割的情况。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| null | null | null | null |
示例 7:空值情况¶
描述: 测试无重叠量且片段按长度完美分割的情况。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| null | 1 | null | null |
示例 8:空值情况¶
描述: 测试无重叠量且片段按长度完美分割的情况。
参数值:
- 表达式:
string - 长度:
length - 重叠量:
overflow
| string | length | overflow | 输出 |
|---|---|---|---|
| Hello world | null | null | null |