Ontology-augmented generation（本体增强生成(Ontology-augmented generation)）¶

LLMs are immensely powerful when applied to business-specific context. When presented with a certain task, the first step is almost always to find the relevant context that should be given to the LLM. Finding the relevant context is often the most challenging part of designing a retrieval augmented generation system. This section outlines some common approaches for context retrieval. Note there is no singular "best approach", as the best solution will be highly dependent on the specifics of the data. However, the themes outlined here are a good starting point and can be modified and combined as appropriate.

With new model generations' increased context lengths, you may not need to use semantic search at all and can instead pass the full context in the prompt. For example, GPT-4o's 128k context window corresponds to 300+ pages of text. If your application's full context is within this limit, we recommend you start without search.

Basic semantic search¶

To create a basic semantic search, do the following:

Follow a chunking strategy.
Create the chunk objects with a media reference property.
Search for the chunk as part of a semantic search workflow.
Use the PDF Viewer widget in Workshop, noting the configuration options.

Embeddings¶

For more information on embeddings, review the documentation on using a Palantir-provided model to create a semantic search workflow.

Retrieval¶

For more information on retrieval, review the documentation on using a Palantir-provided model to create a semantic search workflow.

Advanced retrieval¶

If you are finding that your AIP-associated tool is failing to answer questions that should be found in the document corpus, you should first investigate whether the relevant context was retrieved and passed into the prompt. Often it is the retrieval step that fails to surface the most relevant context leading the LLM to respond appropriately in the subsequent step.

Many approaches exist to improve the retrieval depending on the content of the data and queries, some of which are outlined below:

HyDE (Hypothetical Document Embeddings)
Ranked keyword-based search
Query augmentation
Hybrid search

Technique consideration¶

This will ultimately depend on your specific use case requirements and how much time you are willing to invest. Depending on your use case, it could be that the original simple setup works well enough for you. Otherwise, you may just need to add HyDE and semantic chunking and leave the rest as it is. Our recommendation would be to start with the basic implementation, and then add features as it becomes necessary.

HyDE (Hypothetical document embeddings)¶

One approach to improve retrieval performance is HyDE - otherwise known as Hypothetical document embeddings. The principle idea is that instead of embedding the query directly, you first ask an LLM to produce a hypothetical chunk that answers this question, which you then embed. Intuitively, this helps balance out the asymmetry between a query and its answer. You can also review the related academic journal titled "Precise Zero-Shot Dense Retrieval without Relevance Labels" ↗

This can be particularly helpful in specific cases where chunks are formatted in a particular way to encode the origin document and chapter.

As an example, consider the following chunk as an appropriate answer to the question: “How do we deal with animal collisions?”:

Claim Management - Motor: Animal Collision:
Animal collision claims are generally covered in type A, B, D policies. However,
exclusions apply...

We would first prompt an LLM to generate a hypothetical chunk like so:

You are an insurance specialist assistant tasked to assist your colleagues with
finding relevant documents for their queries.

Given the following user query:
{query}

Produce a hypothetical paragraph answering it. Give your response in the following
format:
{Document Name}: {Chapter name}: {Section} > ...
{Content}

where {Document Name} is the name of the document that contains the passage,
{Chapter name} is the name of the chapter, {Section} is the name of the section
and {Content} is the content of the section.

This prompt would return us a response such as the following:

Animal Claims Management: General Terms:
Animal collision is commonly insured in fully comprehensive packages...

As the LLMs response is already “closer” to the real answer (structurally), it makes its embedding closer to the chunk that contains this real answer. Our semantic search in a function would then look like the following:

async searchChunksByEmbedding(query: string, k: Integer): Promise<Chunk[]>> {
   // create the full prompt for the hypothetical
   const prompt = `...`

   // generate hypothetical chunk
   const hypothetical = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})
   // embed the result
   const embedding = await TextEmbeddingAda_002.createEmbeddings({inputs: [hypothetical]})
   // use the embedding in the nearest neighbor search
   const docs = Objects.search()
                    .chunks()
                    .nearestNeighbors(chunk => chunk.vectorProperty.near(embedding, {kValue: k})
                    .orderByRelevance()
                    .takeAsync(k)
   return docs

Ranked keyword-based search¶

Generic embedding models such as OpenAI Ada are trained on a large corpus of diverse data. If your use case requires search on a domain-specific corpus (for example, manufacturing), you may find that the retrieval does not work as well as expected. This is due to the generic embedding model only using a small part of the embedding space for a specific domain.

Fine-tuning a custom model is one approach to improve retrieval in these cases, however, a much simpler out-of-the-box solution is to use a ranked keyword search, with potentially some LLM preprocessing.

This is because the index that Object Storage v2 runs on already comes with a notion of “relevance” when given a query. This relevance is relative to other chunks, meaning it automatically considers the domain-specific context of a chunk.

Functions on Objects support ordering the results of Object queries by said relevance, so you can write a function like the following:

async searchChunksByKeywords(query: string, k: Integer): Promise<Chunk[]> {
    const chunks = Objects.search()
                    .chunks()
                    .filter(chunk => chunk.text.matchAnyToken(query))
                    .orderByRelevance()
                    .takeAsync(k)
    return chunks
}

However, this method abstracts away the semantic element of a semantic search. For example, if a user asks “How do we deal with deer collisions?” and we just input that directly into the function, we would not find chunks that talk about animal collision in general. LLMs can bring the semantic element back in through query augmentation, described below.

Query augmentation¶

Query pre-processing is an important step to maximize relevancy of returned results. In essence, you want to distill the user query into its core components dependent on the type of search. You can consider query enriching and the other is query extraction.

Query enriching¶

Injecting an LLM step between the user query and what is passed to the keyword search allows the possibility of distilling the query to make it more relevant: the LLM can be prompted to remove stopwords and irrelevant filler phrases (“Help me find ...”), and add other related words or synonyms.

You can set up a prompt like the following:

You are an insurance AI assistant tasked to help users find relevant documents.
To do so, you can use keyword-search in the company's internal database.

Given the following user query: {query}

Give a list of search terms that would find relevant results. Be sure to remove stop words,
and add synonyms and related terms to the most important terms.
Give your answer as a list of comma-separated values.

For our example question of "How do we deal with animal collisions?", the LLM's response would be:

deer, animal, collision, claims, wildlife, accidents, vehicle, damage, car, insurance, policy, coverage, comprehensive, reimbursement

This would allow users to find documents that never mention "deer collisions", but also those that talk about “animal", "wildlife", and "accidents" in general.

In code:

async searchChunksByAugmentedKeywords(query: string, k: Integer): Promise<Chunk[]> {
    // create the full prompt for the query augmentation
    const prompt = `...`

    const augmentedQuery = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})

    const chunks = Objects.search()
                    .chunks()
                    .filter(chunk => chunk.text.matchAnyToken(augmentedQuery))
                    .orderByRelevance()
                    .takeAsync(k)
    return chunks
}

Query extraction¶

Query augmentation works well for relevance-ordered keyword search. For semantic search, however, you need to extract the core ask of the user query, and by doing so, remove extra terms that provide no semantic meaning such as stop words, and potentially lemmatizing or stemming ↗ query terms.

To do so, conduct query extraction to convert a question to the key ask of the user.

An example prompt could be:

You are preparing a user-given query in order to perform a semantic search.
Extract the key user actions from the given query, removing unnecessary stop words
in the process.

Given the following user query: {query}

Return concatanated actions delimeted by full stops.

For our example question of “How do we deal with animal collisions?”, the LLM would return:

Deal with animal collisions.

This response maximizes the semantic content of our query and increasing likelihood of stronger matching downstream once we run a semantic search. The above example could also be solved by removing stopwords only.

async searchChunksByExtractedQuery(query: string, k: Integer): Promise<Chunk[]> {
    // create the full prompt for the query augmentation
    const prompt = `...`

    const augmentedQuery = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})
    const embedding = await TextEmbeddingAda_002.createEmbeddings({inputs: [augmentedQuery]})

    const chunks = Objects.search()
                    .chunks()
                    .nearestNeighbors(obj =>obj.embeddingProperty
                    .near(embedding, { kValue: k }))
                    .allAsync()
    return chunks
}

Hybrid search: Combining semantic and keyword search with reciprocal rank fusion (RRF)¶

Reciprocal rank fusion (RRF) is a simple algorithm to combine results from multiple search types into a single list. In essence, it gives a document a higher score the higher it is ranked in a given list. The total score is the sum of scores across lists.

k acts as a regularizer - the higher k, the less it matters where a document appears in a list, but merely that it appears in the list at all.

`RRFscore(d ∈ D) = Σ [1 / (k + r(d))]`

`# k is a constant that helps to balance between high and low ranking.`
`# r(d)is the rank/position of the document`

public combineResultsWithRRF(vectorSearchResults: Chunk[], keywordSearchResults: Chunk[], k: Integer = 60): Chunk[] {

    // define the RRF scoring function
    const RRF = (r: number, k: number) => 1 / (r + k);

    // initialize a map to keep track of the scores of each chunk
    // note we assume later that each Chunk has a string primary key property "id"
    const resultMap: Map<string, {chunk: Chunk, score: number}> = new Map();
    const combinedResults: Chunk[] = [];

    const searchResultsList = [vectorSearchResults, keywordSearchResults];

    searchResultsList.forEach((searchResults) => {
        searchResults.forEach((chunk, rank) => {
            // calculate the score for each Chunk in the list
            // and add it to the Chunk's total in the map
            const rrfScore = RRF(rank, k);
            const chunkData = resultMap.get(chunk.id) || {chunk: chunk, score: 0};

            chunkData.score += rrfScore;
            resultMap.set(chunk.id, chunkData);
        });
    });

    // get all Chunks into a list
    resultMap.forEach((chunkData) => {
        combinedResults.push(chunkData.chunk);
    });

    // sort them by their score in the resultMap, in descending order
    combinedResults.sort((a, b) => resultMap.get(b.id).score - resultMap.get(a.id).score);

    return combinedResults;
}

A full hybrid search implementation would then look like the following:

async hybridSearch(query: string, k: Integer, n1: Integer, n2: Integer): Promise<Chunk[]> {

    // Start the keyword and vector searches in parallel
    const keywordSearchResultsPromise = searchChunksByKeywords(query, n1)
    const vectorSearchResultsPromise = searchChunksByEmbedding(query, n2)

    const [keywordSearchResults, vectorSearchResults] = await Promise.all([keywordSearchResultsPromise, vectorSearchResultsPromise])

    const rerankedResults = combineResultsWithRRF(vectorSearchResults, keywordSearchResults)

    return rerankedResults.slice(0, k)

中文翻译¶

本体增强生成(Ontology-augmented generation)¶

大语言模型(LLM)在应用于特定业务场景时展现出极其强大的能力。当面对特定任务时，第一步几乎总是找到应提供给LLM的相关上下文。找到相关上下文通常是设计检索增强生成(Retrieval Augmented Generation)系统中最具挑战性的部分。本节概述了一些常见的上下文检索方法。请注意，不存在单一的"最佳方法"，因为最佳解决方案将高度依赖于数据的具体情况。然而，这里概述的主题是一个很好的起点，可以根据需要进行修改和组合。

随着新一代模型上下文长度的增加，您可能完全不需要使用语义搜索(Semantic Search)，而是可以直接在提示词(Prompt)中传递完整的上下文。例如，GPT-4o的128k上下文窗口相当于300多页文本。如果您的应用程序的完整上下文在此限制范围内，我们建议您从不使用搜索开始。

基础语义搜索(Basic Semantic Search)¶

要创建基础语义搜索，请执行以下操作：

遵循分块策略。
使用媒体引用属性创建分块对象。
作为语义搜索工作流的一部分搜索分块。
在Workshop中使用PDF查看器小部件，注意配置选项。

嵌入(Embeddings)¶

检索(Retrieval)¶

高级检索(Advanced Retrieval)¶

如果您发现与AIP关联的工具无法回答应在文档语料库中找到的问题，您应首先调查相关上下文是否已被检索并传递到提示词中。通常，是检索步骤未能找到最相关的上下文，导致LLM在后续步骤中无法正确响应。

根据数据和查询的内容，存在许多改进检索的方法，下面概述了其中一些方法：

HyDE（假设文档嵌入(Hypothetical Document Embeddings)）
基于排名的关键词搜索(Ranked keyword-based search)
查询增强(Query augmentation)
混合搜索(Hybrid search)

技术考量¶

这最终取决于您的具体用例需求以及您愿意投入多少时间。根据您的用例，可能原始的简单设置已经足够满足需求。否则，您可能只需要添加HyDE和语义分块(Semantic Chunking)，其余保持不变。我们的建议是从基础实现开始，然后在必要时逐步添加功能。

HyDE（假设文档嵌入(Hypothetical Document Embeddings)）¶

一种改进检索性能的方法是HyDE——即假设文档嵌入(Hypothetical Document Embeddings)。其核心思想是，不直接嵌入查询，而是先让LLM生成一个假设性的分块来回答该问题，然后再对该分块进行嵌入。直观地说，这有助于平衡查询与其答案之间的不对称性。您也可以查阅相关的学术论文，标题为"Precise Zero-Shot Dense Retrieval without Relevance Labels" ↗

这在特定情况下特别有用，即分块以特定方式格式化以编码原始文档和章节。

例如，考虑以下分块作为问题"我们如何处理动物碰撞？"的适当答案：

索赔管理 - 机动车：动物碰撞：
动物碰撞索赔通常包含在A、B、D类保单中。但是，
免责条款适用...

我们首先提示LLM生成一个假设分块，如下所示：

你是一名保险专家助理，负责协助同事查找与其查询相关的文档。

给定以下用户查询：
{query}

生成一个回答该问题的假设段落。请按以下格式给出回答：
{文档名称}: {章节名称}: {节名称} > ...
{内容}

其中{文档名称}是包含该段落的文档名称，
{章节名称}是章节名称，{节名称}是节名称，
{内容}是节的内容。

此提示将返回如下响应：

动物索赔管理：通用条款：
动物碰撞通常包含在全险套餐中...

由于LLM的响应在结构上已经"更接近"真实答案，因此其嵌入也更接近包含该真实答案的分块。函数中的语义搜索将如下所示：

async searchChunksByEmbedding(query: string, k: Integer): Promise<Chunk[]>> {
   // 为假设创建完整提示
   const prompt = `...`

   // 生成假设分块
   const hypothetical = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})
   // 嵌入结果
   const embedding = await TextEmbeddingAda_002.createEmbeddings({inputs: [hypothetical]})
   // 在最近邻搜索中使用嵌入
   const docs = Objects.search()
                    .chunks()
                    .nearestNeighbors(chunk => chunk.vectorProperty.near(embedding, {kValue: k})
                    .orderByRelevance()
                    .takeAsync(k)
   return docs

基于排名的关键词搜索(Ranked keyword-based search)¶

通用嵌入模型（如OpenAI Ada）是在大量多样化数据上训练的。如果您的用例需要在特定领域的语料库（例如制造业）上进行搜索，您可能会发现检索效果不如预期。这是因为通用嵌入模型仅使用嵌入空间的一小部分来处理特定领域。

在这些情况下，微调自定义模型是一种改进检索的方法，然而，一个更简单的开箱即用解决方案是使用基于排名的关键词搜索，并可能结合一些LLM预处理。

这是因为Object Storage v2运行的索引在给定查询时已经具有"相关性"的概念。这种相关性是相对于其他分块的，意味着它会自动考虑分块的领域特定上下文。

Functions on Objects支持按所述相关性对对象查询结果进行排序，因此您可以编写如下函数：

async searchChunksByKeywords(query: string, k: Integer): Promise<Chunk[]> {
    const chunks = Objects.search()
                    .chunks()
                    .filter(chunk => chunk.text.matchAnyToken(query))
                    .orderByRelevance()
                    .takeAsync(k)
    return chunks
}

然而，这种方法抽象掉了语义搜索的语义元素。例如，如果用户问"我们如何处理鹿碰撞？"，而我们直接将此输入到函数中，我们将无法找到讨论一般动物碰撞的分块。LLM可以通过查询增强将语义元素带回，如下所述。

查询增强(Query Augmentation)¶

查询预处理是最大化返回结果相关性的重要步骤。本质上，您需要根据搜索类型将用户查询提炼为其核心组成部分。您可以考虑查询丰富化和查询提取。

查询丰富化(Query Enriching)¶

在用户查询和传递给关键词搜索的内容之间注入LLM步骤，可以提炼查询使其更相关：可以提示LLM移除停用词和不相关的填充短语（"帮我找到..."），并添加其他相关词或同义词。

您可以设置如下提示：

你是一名保险AI助手，负责帮助用户查找相关文档。
为此，您可以使用公司内部数据库中的关键词搜索。

给定以下用户查询：{query}

提供一份能够找到相关结果的搜索词列表。请务必移除停用词，
并为最重要的词添加同义词和相关术语。
请以逗号分隔的值列表形式给出答案。

对于我们的示例问题"我们如何处理动物碰撞？"，LLM的响应将是：

鹿, 动物, 碰撞, 索赔, 野生动物, 事故, 车辆, 损坏, 汽车, 保险, 保单, 覆盖范围, 全险, 报销

这将使用户能够找到从未提及"鹿碰撞"的文档，也能找到讨论一般"动物"、"野生动物"和"事故"的文档。

在代码中：

async searchChunksByAugmentedKeywords(query: string, k: Integer): Promise<Chunk[]> {
    // 为查询增强创建完整提示
    const prompt = `...`

    const augmentedQuery = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})

    const chunks = Objects.search()
                    .chunks()
                    .filter(chunk => chunk.text.matchAnyToken(augmentedQuery))
                    .orderByRelevance()
                    .takeAsync(k)
    return chunks
}

查询提取(Query Extraction)¶

查询增强适用于按相关性排序的关键词搜索。然而，对于语义搜索，您需要提取用户查询的核心诉求，通过这样做来移除没有语义含义的额外术语（如停用词），并可能对查询词进行词形还原或词干提取 ↗。

为此，进行查询提取以将问题转换为用户的核心诉求。

一个示例提示可以是：

您正在准备用户给定的查询以执行语义搜索。
从给定查询中提取关键用户操作，在此过程中移除不必要的停用词。

给定以下用户查询：{query}

返回以句点分隔的连接操作。

对于我们的示例问题"我们如何处理动物碰撞？"，LLM将返回：

处理动物碰撞。

此响应最大化了我们查询的语义内容，并增加了后续运行语义搜索时更强匹配的可能性。上述示例也可以通过仅移除停用词来解决。

async searchChunksByExtractedQuery(query: string, k: Integer): Promise<Chunk[]> {
    // 为查询增强创建完整提示
    const prompt = `...`

    const augmentedQuery = await GPT_4o.createChatCompletion({messages: [{"role": "user", "contents": [{"text": prompt}]}]})
    const embedding = await TextEmbeddingAda_002.createEmbeddings({inputs: [augmentedQuery]})

    const chunks = Objects.search()
                    .chunks()
                    .nearestNeighbors(obj =>obj.embeddingProperty
                    .near(embedding, { kValue: k }))
                    .allAsync()
    return chunks
}

混合搜索(Hybrid Search)：结合语义搜索和关键词搜索与倒数排名融合(RRF)¶

倒数排名融合(Reciprocal Rank Fusion, RRF)是一种简单的算法，用于将多种搜索类型的结果合并为单个列表。本质上，它给文档的评分越高，该文档在给定列表中的排名就越高。总得分是各列表得分的总和。

k充当正则化器——k值越高，文档在列表中的位置就越不重要，而仅仅出现在列表中本身就变得更重要。

`RRFscore(d ∈ D) = Σ [1 / (k + r(d))]`

`# k 是一个常数，有助于平衡高排名和低排名。`
`# r(d) 是文档的排名/位置`

public combineResultsWithRRF(vectorSearchResults: Chunk[], keywordSearchResults: Chunk[], k: Integer = 60): Chunk[] {

    // 定义RRF评分函数
    const RRF = (r: number, k: number) => 1 / (r + k);

    // 初始化一个映射来跟踪每个分块的得分
    // 注意我们假设每个Chunk都有一个字符串主键属性"id"
    const resultMap: Map<string, {chunk: Chunk, score: number}> = new Map();
    const combinedResults: Chunk[] = [];

    const searchResultsList = [vectorSearchResults, keywordSearchResults];

    searchResultsList.forEach((searchResults) => {
        searchResults.forEach((chunk, rank) => {
            // 计算列表中每个Chunk的得分
            // 并将其添加到映射中该Chunk的总分中
            const rrfScore = RRF(rank, k);
            const chunkData = resultMap.get(chunk.id) || {chunk: chunk, score: 0};

            chunkData.score += rrfScore;
            resultMap.set(chunk.id, chunkData);
        });
    });

    // 将所有Chunk放入列表
    resultMap.forEach((chunkData) => {
        combinedResults.push(chunkData.chunk);
    });

    // 根据它们在resultMap中的得分进行降序排序
    combinedResults.sort((a, b) => resultMap.get(b.id).score - resultMap.get(a.id).score);

    return combinedResults;
}

完整的混合搜索实现将如下所示：

async hybridSearch(query: string, k: Integer, n1: Integer, n2: Integer): Promise<Chunk[]> {

    // 并行启动关键词搜索和向量搜索
    const keywordSearchResultsPromise = searchChunksByKeywords(query, n1)
    const vectorSearchResultsPromise = searchChunksByEmbedding(query, n2)

    const [keywordSearchResults, vectorSearchResults] = await Promise.all([keywordSearchResultsPromise, vectorSearchResultsPromise])

    const rerankedResults = combineResultsWithRRF(vectorSearchResults, keywordSearchResults)

    return rerankedResults.slice(0, k)