Understanding text search(理解文本搜索)¶
Text search across the Foundry platform, including in Object Explorer, Workshop, and the Functions API, relies on an underlying search engine that processes text through a mechanism called an analyzer. Understanding how analyzers work helps you write more effective search queries and build better search experiences in your applications.
How text analysis works¶
When a string property is indexed for search, the platform processes its value through an analyzer. The default analyzer, called the standard analyzer (see Lucene StandardAnalyzer ↗), performs the following steps:
- Splits the text into individual words, called tokens, at whitespace and punctuation boundaries.
- Converts all tokens to lowercase.
For example, a property value of The Quick Brown Fox is stored as the tokens the, quick, brown, and fox. When you search for quick, the search engine matches this token against the stored tokens and finds a match.
This token-based approach has important implications:
- Searches are case-insensitive by default because all tokens are converted to lowercase during indexing.
- Non-phrase queries match tokens independently. For example, a search for
brown foxevaluates thebrownandfoxtokens separately, so it matches any property containing both tokens regardless of order or proximity. To require the tokens to appear together in order, use a phrase search ("brown fox"). - A search for
rowndoes not matchbrownbecauserownis not a complete token.
Analyzer types¶
The analyzer used for a string property is configured in the Ontology. The following analyzer types are available:
| Analyzer | Behavior | Example value | Resulting tokens |
|---|---|---|---|
| Standard (default) | Splits on whitespace and punctuation, converts to lowercase | The Quick-Brown Fox |
the, quick, brown, fox |
| Simple | Splits on any non-letter character, converts to lowercase (digits and punctuation are dropped) | The Quick-Brown Fox 42 |
the, quick, brown, fox |
| Not analyzed | Stores the entire value as a single token | The Quick-Brown Fox |
The Quick-Brown Fox |
| Whitespace | Splits on whitespace only, preserves case | The Quick-Brown Fox |
The, Quick-Brown, Fox |
| Language | Applies language-specific tokenization, stemming, and stopword removal | Running quickly (English) |
run, quick |
The Language analyzer supports the following languages: english, french, german, japanese, korean, arabic, and combined_arabic_english.
:::callout{theme="neutral"} You can check or change the analyzer for a property in the Ontology Manager under the property's search configuration. Properties must have the Searchable render hint enabled to be searchable. :::
Special character handling in tokens¶
The standard analyzer treats underscores and periods as part of a token rather than as separators. This means:
banana_puddingis stored as a single tokenbanana_pudding, not asbananaandpudding.user.nameis stored as a single tokenuser.name, not asuserandname.
A search for banana will not match the value banana_pudding when using token-based search methods. If you need to match such values, use a wildcard search like banana* or consider using the whitespace analyzer.
Search in Object Explorer¶
:::callout{theme="neutral"} Object Explorer remains the primary interface for Ontology discovery, global search, and viewing individual objects. Insight builds on Object Explorer to provide expanded analysis features and is available alongside it. :::
Object Explorer provides two search interfaces, each with different matching behavior.
Global search bar¶
The global search bar on the Object Explorer home page and results page performs a token-based search across all searchable properties of all object types. By default, each word in your query is searched independently using OR logic. For example, searching for yellow cab returns objects matching either yellow or cab.
You can modify this behavior using the search syntax features described in Search syntax:
- Phrase search: Wrap terms in quotation marks (
"yellow cab") to require an exact phrase match. - Logical operators: Use AND, OR, and NOT to combine search criteria.
- Wildcards: Use
*to match zero or more characters, or?to match a single character. Both trailing wildcards (term*) and leading wildcards (*term) are supported. Note that combined leading-and-trailing wildcards (such as*row*) are not supported. - Fuzzy search: Append
~to a term to find approximate matches.
:::callout{theme="warning"}
Leading wildcard search (*term) is only available for string properties that have the Enable leading wildcards render hint enabled in the Ontology Manager. The Searchable render hint must also be selected. For more information on configuring render hints, see Render hints.
:::
:::callout{theme="neutral"} Leading wildcard search is not supported in the global search bar. It is available only when filtering on individual string properties in an exploration. :::
Property filters in explorations¶
When exploring a specific object type, you can add keyword filters on individual string properties. These filters offer the following match modes:
| Mode | Behavior |
|---|---|
| Contains (default) | Matches objects where the property contains all search tokens |
| Starts with | Matches objects where the property contains a token starting with the search term (equivalent to term*) |
| Exact | Matches objects where the property value exactly matches the search term |
| Is not | Excludes objects where the property contains the search tokens |
:::callout{theme="neutral"} Property keyword filters match against individual tokens produced by the property's analyzer. If a property uses the not analyzed analyzer, the entire value is treated as a single token and must be matched accordingly. :::
Leading wildcard search¶
Leading wildcard search allows you to find objects where a property value ends with a specific term. For example, searching for *smith matches values such as Goldsmith, Blacksmith, and smith.
Enabling leading wildcard search¶
To use leading wildcard search, you must enable the Enable leading wildcards render hint on the relevant string properties:
- In the Ontology Manager, navigate to the object type and select the string property you want to configure.
- In the property editor, select the Enable leading wildcards render hint.
- Ensure the Searchable render hint is also selected, as it is required for leading wildcards to function.
- Save your changes and reindex the object type's backing data sources into Object Storage V1 (Phonograph). You can wait for the next triggered reindex or manually start one from the Data sources tab.
For the full list of available render hints, see Render hints.
Using leading wildcard search¶
Once the render hint is enabled, you can use leading wildcards in Object Explorer property filters by prefixing your search term with *. For example:
*smithmatchesGoldsmith,Blacksmith, andsmith.*ingmatchesrunning,swimming, anding.
:::callout{theme="warning"}
Combined leading-and-trailing wildcards (*term*) are not supported. You can use either a leading wildcard (*term) or a trailing wildcard (term*), but not both at the same time. If you need partial string matching, consider using Contour or the Regex mode in a Workshop Filter List.
:::
Search in Workshop¶
Workshop provides several ways to search and filter objects, each with different capabilities.
Filter list keyword search¶
The Filter List keyword search component supports five search modes, selectable by the user at query time:
| Mode | Behavior |
|---|---|
| All | Broadest search. Combines token matching, wildcard matching, and prefix matching to return any results that partially or fully match the query. Wildcard sub-matches compare the query directly against indexed tokens without applying the analyzer to the query, so wildcard queries work most predictably on properties using the Not analyzed or Whitespace analyzer. |
| Any | Matches objects where the property contains any search tokens |
| Exact | Matches objects where the property contains all search tokens as an exact phrase, in order |
| Advanced | Supports Boolean syntax with AND, OR, NOT, quotation marks, and parentheses for complex queries |
| Regex | Matches objects using a regular expression pattern against the property tokens |
For more details on advanced filtering, see the Filter List advanced filtering documentation.
Exploration search bar widget¶
The Exploration Search Bar widget in Workshop uses the same search infrastructure as the Object Explorer global search bar. It supports the same syntax including quotation marks for phrase search, logical operators, wildcards, and fuzzy search.
Object set filter variables¶
Object set filter variables support a CONTAIN filter that performs prefix-only matching. For example, if a property value is id000123 and the filter query is id0001, this is considered a match. However, the query d0001 would not match because it does not start at the beginning of the value.
Search in the Functions API¶
The Functions API provides the most granular control over text search behavior through its string filter methods:
| Method | Behavior |
|---|---|
.exactMatch() |
Matches objects where the property value exactly matches the query string |
.phrase() |
Splits the query into tokens and matches values containing all tokens in order with no other tokens between them |
.phrasePrefix() |
Same as .phrase(), but the last token also matches tokens that start with it |
.prefixOnLastToken() |
Splits the query into tokens and matches values containing all tokens in any order, where the last token also matches tokens starting with it |
.matchAnyToken() |
Splits the query into tokens and matches values containing any of the tokens |
.matchAllTokens() |
Splits the query into tokens and matches values containing all tokens in any order |
.fuzzyMatchAnyToken() |
Same as .matchAnyToken() but allows approximate matches within an edit distance |
.fuzzyMatchAllTokens() |
Same as .matchAllTokens() but allows approximate matches within an edit distance |
:::callout{theme="neutral"}
The .phrase() and .phrasePrefix() methods do not match across token boundaries created by underscores or periods. For example, .phrase("banana") does not match the value banana_pudding because banana_pudding is a single token.
:::
Comparison of search capabilities¶
The table below summarizes which search capabilities are available in each context:
| Capability | Object Explorer (global) | Object Explorer (property filter) | Workshop Filter List | Functions API |
|---|---|---|---|---|
| Token search | Yes | Yes | Yes | Yes |
| Phrase search | Yes (quotation marks) | No | Yes (Exact mode) | Yes (.phrase()) |
| Prefix search | Yes (term*) |
Yes (Starts with) | Yes (All mode) | Yes (.phrasePrefix(), .prefixOnLastToken()) |
| Leading wildcard search | No | Yes (requires render hint) | No | No |
| Wildcard search | Yes (*, ?) |
No | Yes (All mode) | No |
| Fuzzy search | Yes (~) |
No | No | Yes (.fuzzyMatchAnyToken(), .fuzzyMatchAllTokens()) |
| Boolean operators | Yes (AND, OR, NOT) | No | Yes (Advanced mode) | Yes (Filters.and(), Filters.or(), Filters.not()) |
| Regular expressions | No | No | Yes (Regex mode) | No |
Common pitfalls¶
- Partial word searches do not match tokens: Searching for
appdoes not matchapplicationunless you use a wildcard (app*) or a prefix search method. This is because the standard analyzer produces the tokenapplication, andappis not an exact token match. - Underscores and periods prevent token splitting: The standard analyzer treats
first_nameanduser.nameas single tokens. If you need to search forfirstwithinfirst_name, use a wildcard or consider changing the property's analyzer. - Multi-word queries without quotation marks use OR logic in Object Explorer: Searching for
New Yorkin the Object Explorer global search bar returns objects matchingnewORyork, which may include results you did not expect. Use"New York"for an exact phrase match. - Wildcard filters do not analyze the query: When using wildcard search (such as
Quick*), the query string is not run through the analyzer — it is compared character-for-character against indexed tokens. This has two consequences: - Case must match the indexed form. On a property using the standard analyzer (which lowercases all tokens), a wildcard query containing uppercase letters will not match, even though the original text contained those letters before analysis lowercased them.
- Multi-word wildcard queries do not match. Because each indexed token is a single word, a wildcard query containing whitespace (such as
a search term*) is compared against individual tokens and cannot match. Wildcard searches are therefore effectively limited to single-word queries. For partial matching across words, consider Contour or the Regex mode in a Workshop Filter List. - Leading wildcards require a render hint: Leading wildcard search (
*term) is only available on string properties that have the Enable leading wildcards render hint enabled in the Ontology Manager. Without this render hint, leading wildcard queries do not return results. - Combined leading-and-trailing wildcards are not supported: You cannot search for
*term*in Object Explorer or Workshop. If you need partial string matching, consider using Contour or the Regex mode in a Workshop Filter List.
中文翻译¶
理解文本搜索¶
在 Foundry 平台(包括 Object Explorer、Workshop 和 Functions API)中进行文本搜索时,底层依赖一个搜索引擎,该引擎通过一种称为分析器(analyzer)的机制来处理文本。了解分析器的工作原理有助于您编写更有效的搜索查询,并在应用程序中构建更好的搜索体验。
文本分析的工作原理¶
当为搜索索引字符串属性时,平台会通过分析器处理其值。默认的分析器称为标准分析器(standard analyzer,参见 Lucene StandardAnalyzer ↗),它执行以下步骤:
- 在空白字符和标点符号边界处将文本拆分为单个单词,称为词元(tokens)。
- 将所有词元转换为小写。
例如,属性值 The Quick Brown Fox 会被存储为词元 the、quick、brown 和 fox。当您搜索 quick 时,搜索引擎会将该词元与存储的词元进行匹配,并找到匹配项。
这种基于词元的方法具有以下重要含义:
- 搜索默认是不区分大小写的,因为在索引过程中所有词元都被转换为小写。
- 非短语查询会独立匹配词元。例如,搜索
brown fox会分别评估brown和fox这两个词元,因此它会匹配任何同时包含这两个词元的属性,无论其顺序或邻近程度如何。要要求词元按顺序一起出现,请使用短语搜索("brown fox")。 - 搜索
rown不会匹配brown,因为rown不是一个完整的词元。
分析器类型¶
字符串属性使用的分析器在 Ontology 中进行配置。可用的分析器类型如下:
| 分析器 | 行为 | 示例值 | 生成的词元 |
|---|---|---|---|
| 标准(Standard,默认) | 按空白字符和标点符号拆分,转换为小写 | The Quick-Brown Fox |
the、quick、brown、fox |
| 简单(Simple) | 按任何非字母字符拆分,转换为小写(数字和标点符号被丢弃) | The Quick-Brown Fox 42 |
the、quick、brown、fox |
| 不分析(Not analyzed) | 将整个值存储为单个词元 | The Quick-Brown Fox |
The Quick-Brown Fox |
| 空白字符(Whitespace) | 仅按空白字符拆分,保留大小写 | The Quick-Brown Fox |
The、Quick-Brown、Fox |
| 语言(Language) | 应用特定语言的词元化、词干提取和停用词移除 | Running quickly(英语) |
run、quick |
语言分析器支持以下语言:english、french、german、japanese、korean、arabic 和 combined_arabic_english。
:::callout{theme="neutral"} 您可以在 Ontology Manager 中属性的搜索配置下检查或更改属性的分析器。属性必须启用 可搜索(Searchable)渲染提示才能被搜索。 :::
词元中的特殊字符处理¶
标准分析器将下划线和句点视为词元的一部分,而不是分隔符。这意味着:
banana_pudding被存储为单个词元banana_pudding,而不是banana和pudding。user.name被存储为单个词元user.name,而不是user和name。
使用基于词元的搜索方法时,搜索 banana 不会匹配值 banana_pudding。如果您需要匹配此类值,请使用通配符搜索(如 banana*)或考虑使用空白字符分析器。
Object Explorer 中的搜索¶
:::callout{theme="neutral"} Object Explorer 仍然是 Ontology 发现、全局搜索和查看单个对象的主要界面。Insight 在 Object Explorer 的基础上构建,提供扩展的分析功能,可与其同时使用。 :::
Object Explorer 提供两种搜索界面,每种界面具有不同的匹配行为。
全局搜索栏¶
Object Explorer 主页和结果页面上的全局搜索栏对所有对象类型的所有可搜索属性执行基于词元的搜索。默认情况下,查询中的每个单词使用 OR 逻辑独立搜索。例如,搜索 yellow cab 会返回匹配 yellow 或 cab 的对象。
您可以使用搜索语法中描述的搜索语法功能来修改此行为:
- 短语搜索: 将术语用引号括起来(
"yellow cab")以要求精确短语匹配。 - 逻辑运算符: 使用 AND、OR 和 NOT 来组合搜索条件。
- 通配符: 使用
*匹配零个或多个字符,或使用?匹配单个字符。支持尾部通配符(term*)和首部通配符(*term)。请注意,不支持组合的首尾通配符(如*row*)。 - 模糊搜索: 在术语后附加
~以查找近似匹配。
:::callout{theme="warning"}
首部通配符搜索(*term)仅适用于在 Ontology Manager 中启用了 启用首部通配符(Enable leading wildcards)渲染提示的字符串属性。还必须选择 可搜索(Searchable)渲染提示。有关配置渲染提示的更多信息,请参见渲染提示。
:::
:::callout{theme="neutral"} 全局搜索栏不支持首部通配符搜索。它仅在探索中对单个字符串属性进行筛选时可用。 :::
探索中的属性筛选器¶
在探索特定对象类型时,您可以在单个字符串属性上添加关键字筛选器。这些筛选器提供以下匹配模式:
| 模式 | 行为 |
|---|---|
| 包含(Contains,默认) | 匹配属性包含所有搜索词元的对象 |
| 开头为(Starts with) | 匹配属性包含以搜索词元开头的词元的对象(相当于 term*) |
| 精确(Exact) | 匹配属性值与搜索词元完全相同的对象 |
| 不是(Is not) | 排除属性包含搜索词元的对象 |
:::callout{theme="neutral"} 属性关键字筛选器会匹配属性分析器生成的单个词元。如果属性使用不分析(not analyzed)分析器,则整个值被视为单个词元,必须相应地进行匹配。 :::
首部通配符搜索¶
首部通配符搜索允许您查找属性值以特定词元结尾的对象。例如,搜索 *smith 会匹配 Goldsmith、Blacksmith 和 smith 等值。
启用首部通配符搜索¶
要使用首部通配符搜索,您必须在相关字符串属性上启用 启用首部通配符(Enable leading wildcards)渲染提示:
- 在 Ontology Manager 中,导航到对象类型并选择要配置的字符串属性。
- 在属性编辑器中,选择 启用首部通配符(Enable leading wildcards)渲染提示。
- 确保也选择了 可搜索(Searchable)渲染提示,因为它是首部通配符功能所必需的。
- 保存更改并重新索引对象类型的支持数据源到 Object Storage V1 (Phonograph)。您可以等待下一次触发的重新索引,或从 数据源(Data sources)选项卡手动启动一次。
有关可用渲染提示的完整列表,请参见渲染提示。
使用首部通配符搜索¶
启用渲染提示后,您可以在 Object Explorer 属性筛选器中通过在搜索词元前添加 * 来使用首部通配符。例如:
*smith匹配Goldsmith、Blacksmith和smith。*ing匹配running、swimming和ing。
:::callout{theme="warning"}
不支持组合的首尾通配符(*term*)。您可以使用首部通配符(*term)或尾部通配符(term*),但不能同时使用两者。如果您需要部分字符串匹配,请考虑使用 Contour 或 Workshop 筛选列表中的正则表达式模式。
:::
Workshop 中的搜索¶
Workshop 提供了多种搜索和筛选对象的方式,每种方式具有不同的功能。
筛选列表关键字搜索¶
筛选列表(Filter List)关键字搜索组件支持五种搜索模式,用户可在查询时选择:
| 模式 | 行为 |
|---|---|
| 全部(All) | 最广泛的搜索。结合词元匹配、通配符匹配和前缀匹配,返回部分或完全匹配查询的任何结果。通配符子匹配将查询直接与索引词元进行比较,而不对查询应用分析器,因此通配符查询在使用不分析(Not analyzed)或空白字符(Whitespace)分析器的属性上最可预测。 |
| 任意(Any) | 匹配属性包含任何搜索词元的对象 |
| 精确(Exact) | 匹配属性按顺序包含所有搜索词元作为精确短语的对象 |
| 高级(Advanced) | 支持使用 AND、OR、NOT、引号和括号进行复杂查询的布尔语法 |
| 正则表达式(Regex) | 使用正则表达式模式针对属性词元匹配对象 |
有关高级筛选的更多详细信息,请参见筛选列表高级筛选文档。
探索搜索栏组件¶
Workshop 中的探索搜索栏(Exploration Search Bar)组件使用与 Object Explorer 全局搜索栏相同的搜索基础设施。它支持相同的语法,包括用于短语搜索的引号、逻辑运算符、通配符和模糊搜索。
对象集筛选变量¶
对象集筛选变量(Object set filter variables)支持仅执行前缀匹配的 CONTAIN 筛选器。例如,如果属性值为 id000123,筛选查询为 id0001,则视为匹配。但是,查询 d0001 不会匹配,因为它不是从值的开头开始的。
Functions API 中的搜索¶
Functions API 通过其字符串筛选方法提供对文本搜索行为最精细的控制:
| 方法 | 行为 |
|---|---|
.exactMatch() |
匹配属性值与查询字符串完全相同的对象 |
.phrase() |
将查询拆分为词元,并匹配按顺序包含所有词元且词元之间没有其他词元的值 |
.phrasePrefix() |
与 .phrase() 相同,但最后一个词元也匹配以其开头的词元 |
.prefixOnLastToken() |
将查询拆分为词元,并按任意顺序匹配包含所有词元的值,其中最后一个词元也匹配以其开头的词元 |
.matchAnyToken() |
将查询拆分为词元,并匹配包含任何词元的值 |
.matchAllTokens() |
将查询拆分为词元,并按任意顺序匹配包含所有词元的值 |
.fuzzyMatchAnyToken() |
与 .matchAnyToken() 相同,但允许在编辑距离内进行近似匹配 |
.fuzzyMatchAllTokens() |
与 .matchAllTokens() 相同,但允许在编辑距离内进行近似匹配 |
:::callout{theme="neutral"}
.phrase() 和 .phrasePrefix() 方法不会跨由下划线或句点创建的词元边界进行匹配。例如,.phrase("banana") 不会匹配值 banana_pudding,因为 banana_pudding 是一个单个词元。
:::
搜索功能比较¶
下表总结了每种上下文中可用的搜索功能:
| 功能 | Object Explorer(全局) | Object Explorer(属性筛选器) | Workshop 筛选列表 | Functions API |
|---|---|---|---|---|
| 词元搜索 | 是 | 是 | 是 | 是 |
| 短语搜索 | 是(引号) | 否 | 是(精确模式) | 是(.phrase()) |
| 前缀搜索 | 是(term*) |
是(开头为) | 是(全部模式) | 是(.phrasePrefix()、.prefixOnLastToken()) |
| 首部通配符搜索 | 否 | 是(需要渲染提示) | 否 | 否 |
| 通配符搜索 | 是(*、?) |
否 | 是(全部模式) | 否 |
| 模糊搜索 | 是(~) |
否 | 否 | 是(.fuzzyMatchAnyToken()、.fuzzyMatchAllTokens()) |
| 布尔运算符 | 是(AND、OR、NOT) | 否 | 是(高级模式) | 是(Filters.and()、Filters.or()、Filters.not()) |
| 正则表达式 | 否 | 否 | 是(正则表达式模式) | 否 |
常见陷阱¶
- 部分单词搜索不匹配词元: 搜索
app不会匹配application,除非您使用通配符(app*)或前缀搜索方法。这是因为标准分析器生成词元application,而app不是精确的词元匹配。 - 下划线和句点阻止词元拆分: 标准分析器将
first_name和user.name视为单个词元。如果您需要在first_name中搜索first,请使用通配符或考虑更改属性的分析器。 - 不带引号的多词查询在 Object Explorer 中使用 OR 逻辑: 在 Object Explorer 全局搜索栏中搜索
New York会返回匹配new或york的对象,这可能包含您意想不到的结果。使用"New York"进行精确短语匹配。 - 通配符筛选器不分析查询: 使用通配符搜索(如
Quick*)时,查询字符串不会通过分析器处理——它会逐字符与索引词元进行比较。这有两个后果: - 大小写必须与索引形式匹配。 在使用标准分析器(将所有词元转换为小写)的属性上,包含大写字母的通配符查询不会匹配,即使原始文本在分析器将其转换为小写之前包含这些字母。
- 多词通配符查询不匹配。 由于每个索引词元是一个单词,包含空白字符的通配符查询(如
a search term*)会与单个词元进行比较,无法匹配。因此,通配符搜索实际上仅限于单词查询。对于跨单词的部分匹配,请考虑 Contour 或 Workshop 筛选列表中的正则表达式模式。 - 首部通配符需要渲染提示: 首部通配符搜索(
*term)仅适用于在 Ontology Manager 中启用了 启用首部通配符(Enable leading wildcards)渲染提示的字符串属性。没有此渲染提示,首部通配符查询不会返回结果。 - 不支持组合的首尾通配符: 您不能在 Object Explorer 或 Workshop 中搜索
*term*。如果您需要部分字符串匹配,请考虑使用 Contour 或 Workshop 筛选列表中的正则表达式模式。