跳转至

API: Object sets(API:对象集(Object sets))

An object set represents an unordered collection of objects of a single type. You can use the functions APIs to filter object sets, perform Search Arounds to other object types based on defined link types, and compute aggregated values or retrieve the concrete objects. In addition to passing individual objects as inputs into a function, you can search for object sets at any time using the object search APIs.

:::callout{theme="neutral"} Filtering, ordering, and aggregations only work on properties that have the Searchable render hint enabled in the Ontology app. These properties have been indexed for search. Learn how to enable the Searchable render hint. :::

:::callout{theme="info"} Object sets are more efficient than object arrays for function inputs because they defer loading until needed. For best practices on using object sets efficiently, see Optimize performance. :::

The Objects.search() interface allows you to initiate a search for any of the object types imported into your project. In this example, the function uses the given airportCode to find all flights that departed from that airport. Then, it finds all the distinct destinations of those flights and returns them.

import { Function } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class FlightFunctions {
    @Function()
    public currentFlightDestinations(airportCode: string): Set<string> {
        const flightsFromAirport = Objects.search()
            .flights()
            .filter(flight => flight.departureAirportCode.exactMatch(airportCode))
            .all();

        const destinations = flightsFromAirport.map(flight => flight.arrivalAirportCode!);

        return new Set(destinations);
    }
}

Object sets can also be created from a list of objects, list of object resource identifiers or an object set resource identifier by passing them as an argument to the searched object type. For example: Objects.search().flights([flight]).

Once you have an object set of a given type, you can perform various operations on the set as documented below.

Filtering

The .filter() method on an object set allows you to filter the object set based on the searchable properties of the objects. The filter method takes a filter definition, which is based on the type of the property you are filtering on.

  • All property types support the .exactMatch() filter, which filters to objects with an exact match on that property value. This is useful to filter for exact matches on strings (as in the example above), or to filter on the primary key of an object (for example,.filter(object => object.primaryKey.exactMatch(PrimaryKey))).
  • To check whether a property is null or undefined, use the hasProperty() method.
  • To pass multiple values, use the spread operator .exactMatch(...listVariable). If an empty array is passed in, the filter will be ignored.
  • String properties support several keyword filters. See the documentation on each method in Code Assist for full details.
  • .phrase() splits the search query into tokens (usually individual words) and then filters values based on whether they contain all of the given tokens in order with no other tokens in between. Note that string values that are separated by underscores or periods will be treated as one token. For example, when searching for "banana", an object with the property value "banana_pudding" or "banana.pudding" will not be returned.
  • .phrasePrefix() is almost identical to phrase(), but the last token will also match tokens starting with that token. For example, a .phrasePrefix() search for fresh banana would match the property value fresh banana_pudding, but not the property value banana_pudding fresh. A .phrasePrefix() search for pudding would not match the property value banana_pudding.
  • .prefixOnLastToken() splits the search query into tokens and then filters values based on whether they contain all of the given tokens, where the last token will also match tokens starting with that token. For example, big app would match big apples as well as apples from the big tree, though it would not match apples from the biggest tree.
  • .matchAnyToken(), .fuzzyMatchAnyToken() split the search query into tokens and then filter values based on whether they contain any of the given tokens. The fuzzy version allows approximate values to match.
  • .matchAllTokens(), .fuzzyMatchAllTokens() split the search query into tokens and then filter values based on whether they contain all of the given tokens. The fuzzy version allows approximate values to match.
    • Fuzzy filters can take an optional Fuzziness parameter imported from @foundry/functions-api.
    • Explanations of the available Fuzziness options can be found in the ElasticSearch documentation ↗. More information can also be found below.
  • Numbers, dates, and timestamp properties support .range() filters.
  • Range filters have a set of .lt(), .lte(), .gt() and gte() methods for performing less than / less than or equal to / greater than / greater than or equal to (respectively) comparisons.
  • Boolean properties support .isTrue() and .isFalse() filters.
  • Geopoint properties support .withinDistanceOf(), .withinPolygon(), and .withinBoundingBox() filters.
  • GeoShape properties support .withinBoundingBox(), .intersectsBoundingBox(), .doesNotIntersectBoundingBox(), .withinPolygon(), .intersectsPolygon(), and doesNotIntersectPolygon() filters.
  • Link filters can be used to filter objects that do or do not have any linked objects of a specific type using the .isPresent() method.
  • Array properties support the .contains() filter, which filters to objects whose array property values contain any of the given values.

Combining filters

You can compose filters together using the Filters API exported from @foundry/functions-api. The available methods are:

  • and() filters the object set to objects that pass all the given filters
  • or() filters the object set to objects that pass any of the given filters
  • not() negates the given filter

In the example below, we can filter an object set of flights by flight destination using and():

import { Filters } from "@foundry/functions-api";


Objects.search()
    .flights()
    .filter(flight => Filters.or(
        Filters.and(flight.destination.exactMatch("SFO"), flight.passengerCount.gt(100)),
        Filters.and(flight.destination.exactMatch("LAX"), flight.passengerCount.gt(300)),
    ))

The above code would filter to flights that either arrived at SFO with more than 100 passengers or arrived at LAX with more than 300 passengers.

:::callout{theme="warning" title="Warning"} The .filter() method on an object set does not use the operators && or ||. To apply multiple filters, you must use one of the methods on Filters listed above (or call .filter() multiple times to achieve an and condition). :::

Specifying the optional fuzziness parameter can provide more fine-tuned control over Fuzzy matching behavior. If you do not specify fuzziness, then an automatic edit distance is allowed based on the length of the token you are searching for. You will need to import Fuzziness from @foundry/functions-api in order to specify edit distance.

Fuzzy match any token

Objects.search().employee().filter(employee => employee.firstName.fuzzyMatchAnyToken("Michael", { fuzziness: Fuzziness.LEVENSHTEIN_TWO })).all();

The code above returns any employees with a first name within two edits of the provided search term (with Levenshtein distance of two). In this example, that would include Michael, Micheal, Mikhael, Michel, Mikhail, Mihail (but not Miguel for example). If you have more certainty in the accuracy of your search term, you can search with a smaller edit distance (with different Levenshtein distances), refining your search results a little more.

Fuzzy match all tokens

Objects.search().employee().filter(employee => employee.fullName.fuzzyMatchAllTokens("Michael Smith", { fuzziness: Fuzziness.LEVENSHTEIN_ONE })).all();

You can also use fuzzy filters on a multiple token phrase. The code above would match on employees whose full name contains both Michael and Smith with up to one edit in each token - for example, Mikhael Smitt (that is, each with a Levenshtein distance of one each). The ordering of tokens is not taken into account with a fuzzyMatchAllTokens or fuzzyMatchesAllTokens filter.

Fuzzy match on string array properties

All filters on array-based properties can use the methods available to their underlying type. For example, string array properties can be filtered based on any methods available to string properties, though the naming of the methods may differ slightly. Filtering on array properties requires a single match among the array elements in order for that object to be returned.

Search Around

:::callout{title="Search around limits"} Object sets loaded into memory .all() or .allAsync() are allowed to have a maximum of 3 search arounds. If more than 3 search arounds are used, an error is thrown. When performing a search around from object set A to object set B in Object Storage V2, the resulting object set B cannot have more than 10 million object instances, or an error will be thrown. For Object Storage V1, the limit is 100,000 object instances. :::

Based on the object type of your object set, Search Around methods are generated to enable traversing links based on the object type of your object set. In the below example, we filter to an object set of Flights based on the departure code, then Search Around to the passengers on those flights. This results in an object set of Passengers, which can be further filtered or searched around on.

const passengersDepartingFromAirport = Objects.search()
    .flights()
    .filter(flight => flight.departureAirportCode.exactMatch(airportCode))
    .searchAroundPassengers();

Search Around methods will only be generated for link types that are imported into your project. Refer to the tutorial for details on how to import link types.

Note that for performance reasons, the number of Search Around operations you can conduct in a single search is currently limited to 3. If you attempt to run a search with more than three levels of Search Around depth, the search will fail at runtime.

K-nearest neighbors (KNN)

:::callout{title="KNN Limits"} KNN is only supported on object types indexed into OSv2. The k value is limited to the range 0 < K <= 100. Also, the search vector must be the same size as the one used for indexing and has a 2048 dimension limit. An error will be thrown if any of these limits are exceeded. :::

Object types with embedding properties will be available for KNN searches. These searches will return the k value objects that have an embedding property nearest to the provided embedding parameter. The following example returns the most similar movies to a provided movie script. Embeddings can be generated in transformation tools such as Pipeline Builder ; or at function query time using a Palantir-provided embedding model or your own model in a function.

Make sure that your functions repository's functions.json configuration file has the enableVectorProperties entry set to true.

import { Objects } from "@foundry/ontology-api";

const kValue: number = 2;
// Vector can be generated from FML Live or come from an existing object
const vector: Double[] = [0.7, 0.1, 0.3];
const movies: Movies[] = Objects.search()
        .movies()
        .nearestNeighbors(obj => obj.vectorProperty.near(vector, { kValue }))
        .orderByRelevance()
        .take(kValue);

For an example of a full semantic search workflow, review the semantic search workflow guide.

Set operations

Object sets of the same object type can be combined in various ways using set operations:

  • .union() creates a new object set composed of objects present in any of the given object sets.
  • .intersect() creates a new object set composed of objects present in all of the given object sets.
  • .subtract() removes any objects present in the given object sets.

Retrieving all objects

The .all() and .allAsync() methods retrieve all objects in the object set. Note that if you attempt to load too many objects at once, your function will fail to execute. Currently, the maximum number of objects you can load is 100,000. However, loading more than 10,000 objects may also cause your function execution to time out. Learn more about time and space limits in functions.

You can use the .allAsync() method to retrieve a Promise that resolves to all the objects in the object set. This can be useful for loading data from multiple object sets in parallel.

Ordering and limiting

Instead of retrieving all objects, you can load a limited number by applying an ordering clause to your object set, then specifying a specific number of objects to load. To do this, you can use the following methods:

  • .orderBy() specifies a searchable property to order by, and allows you to specify an ordering direction. Only properties whose types can be ordered (numbers, dates, and strings) are available for selection in this method. You can call .orderBy() multiple times to sort by multiple properties.
  • .orderByRelevance() specifies that the objects should be returned in order of how well they match the provided filters, with the most relevant listed first. Relevance for a query term against a property value on a given object is a complex determination that takes into account the frequency of the term appearing in the property value, the frequency of the term appearing across all objects, and more. Relevance is less appropriate when performing only .exactMatch() filters or filtering on non-string properties. Note that only one of .orderBy() and .orderByRelevance() may be used in a single search.
  • .take() and .takeAsync() enable you to retrieve a specified number of objects from the set. These methods are only available after you have specified an ordering.

For example, the following code would retrieve the ten employees with the earliest start dates:

Objects.search()
    .employees()
    .orderBy(e => e.startDate.asc())
    .take(10)

As another example, imagine an object type claims which contains text of accident claims for an insurance company. We'd like to find a specific claim involving a red car and a deer. Without the .orderByRelevance() line, any results containing any of the words red, car, collision, with, or deer may have been returned in the top 10 results. With the .orderByRelevance() line, the first 10 results will be the claims that contain the most search terms, so that the most relevant claims will appear first.

const results = Objects.search()
    .claims()
    .filter(doc => doc.text.matchAnyToken("red car collision with deer"))
    .orderByRelevance()
    .take(10)

Computing aggregations

:::callout{title="Aggregation limits"} Aggregations returned from the Objects API are limited to 10,000 total buckets. An error will be thrown if this limit is exceeded.

When bucketing using .topValues(), results will be approximate if the data has more than 1,000 distinct values. The list of top values may not be accurate in that case. :::

Grouping objects by properties

In many cases, it's unnecessary to load all of the objects in your object set. Instead, you can simply load a bucketed aggregation of values to conduct further analysis.

To begin computing an aggregation, call the .groupBy() method on an object set. This allows you to specify bucketing on one of the searchable properties of the object type in the object set. For example, this code groups employees by their start date:

Objects.search()
    .employees()
    .groupBy(e => e.startDate.byDays())

When specifying which property to bucket by, you will have to provide additional information about how the bucketing should be done depending on the property type:

  • For boolean properties, the only option is .topValues(). This returns two buckets, one for true and one for false.
  • For string properties, there are two options:
  • .topValues(): For rapid response times and properties with a smaller cardinality. This buckets by the top 1,000 values for the string property. This limit is to ensure that the returned aggregation is not excessively large.
  • .exactValues(): For more exact aggregations and the possibility to consider up to 10,000 buckets for high cardinality properties. The amount of considered buckets can be specified via .exactValues({"maxBuckets": numBuckets}) where numBuckets must be an integer value between 0 and 10,000. The response time for this method can take longer, as more results have to be considered.
  • For numeric properties (e.g. Integer, Long, Float, Double), the two bucketing options are:
  • .byRanges() allows you to specify the exact ranges that should be used. For example, you could use .byRanges({ min: 0, max: 50 }, { min: 50, max: 100 }) to bucket objects into the two ranges of [0, 50] and [50, 100] that you specify here. The min of the range is inclusive and the max is exclusive. You may omit either min or max to represent a bucket containing values from -∞ to max or min to ∞ respectively.
  • .byFixedWidth() specifies the width of each bucket. For example, you could use .byFixedWidth(50) to bucket objects into ranges that each have a width of 50.
  • For LocalDate properties, various convenience methods are provided for easy bucketing:
  • .byYear()
  • .byQuarter()
  • .byMonth()
  • .byWeek()
  • .byDays() buckets values into days. You may pass in a number of days to use for bucket widths.
  • For Timestamp properties, the same bucketing options apply as for LocalDate, as well as the following additions:
  • .byHours() buckets values by hours. You may pass in a number of hours to use for bucket widths.
  • .byMinutes() buckets values by minutes. You may pass in a number of minutes to use for bucket widths.
  • .bySeconds() buckets values by seconds. You may pass in a number of seconds to use for bucket widths.
  • For Array properties, the bucketing options are determined by the type of the elements in the array. In particular, you get the same bucketing methods for Array<PropertyType> as you would get for the PropertyType (for example, Array<boolean> gets the same bucketing methods as boolean).
  • For example, if you have an Array<string> called employeeSet consisting of Alice and Bob who have respectively worked in ["US", "UK"] and ["US"]. Then employeeSet.groupBy(e => e.pastCountries.exactValue()).count() will return { "US": 2, "UK": 1 }.

After grouping by one property, you may optionally call the .segmentBy() method to perform further bucketing. This allows you to compute a three-dimensional aggregation bucketed by two searchable properties. For example, you could group employees by their start date as well as their role as follows:

Objects.search()
    .employees()
    .groupBy(e => e.startDate.byDays())
    .segmentBy(e => e.role.topValues())

Choosing an aggregation metric

After grouping your object set, you can call various aggregation methods to compute aggregation metrics on each bucket. Methods that require a property only accept properties marked searchable. Possible aggregation methods are:

  • .count() simply returns the number of objects in each bucket
  • .average() returns the average number for the given numeric, timestamp, date property
  • .max() returns the maximum value for the given numeric, timestamp, date property
  • .min() returns the minimum value for the given numeric, timestamp, date property
  • .sum() returns the sum of values for the given numeric property
  • .cardinality() returns the approximate number of distinct values for the given property

Calling one of these methods returns either a TwoDimensionalAggregation or ThreeDimensionalAggregation. A ThreeDimensionalAggregation is returned if you called .segmentBy() before calling one of the final aggregation methods.

Learn more about the structure of these aggregation types, including valid bucketing types.

Note that the resulting aggregations are wrapped in a Promise, as computing the aggregation requires loading data from a remote service. You can use the async/await ↗ syntax to unwrap the Promise result.

Below is a full example of loading an aggregation and returning it as a result.

import { Function, ThreeDimensionalAggregation } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class AggregationFunctions {
    @Function()
    public async employeesByRoleAndOffice(): Promise<ThreeDimensionalAggregation<string, string>> {
        return Objects.search()
            .employee()
            .groupBy(e => e.title.topValues())
            .segmentBy(e => e.office.topValues())
            .count();
    }
}

Below is a full example of aggregating without groupBy statements:

import { Function } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class AggregationFunctions {
    @Function()
    public async employeesStats(): Promise<Double> {
        // Count of all employees, default to zero if count() returns undefined
        return Objects.search().employee().count() ?? 0;
    }
}

You can also perform other aggregations without groupBy by replacing the appropriate line in the code example above, such as:

  • Count of all employees: Objects.search().employee().count(); (as seen in example above)
  • Average tenure of employees: Objects.search().employee().average(e => e.tenure);
  • Maximum tenure of employees: Objects.search().employee().max(e => e.tenure);
  • Minimum tenure of employees: Objects.search().employee().min(e => e.tenure);
  • Sum of all employee salaries: Objects.search().employee().sum(e => e.salary);
  • Number of offices: Objects.search().employee().cardinality(e => e.office);

For an example of manipulating aggregation results in memory, try the guide for creating custom aggregations.


中文翻译


API:对象集(Object sets)

对象集表示单一类型的对象的无序集合。您可以使用函数 API 来过滤对象集、基于定义的链接类型对其他对象类型执行"周边搜索"(Search Around),以及计算聚合值或检索具体对象。除了将单个对象作为函数输入外,您还可以随时使用对象搜索 API 来搜索对象集。

:::callout{theme="neutral"} 过滤、排序和聚合仅适用于在 Ontology 应用中启用了 Searchable 渲染提示(render hint)的属性。这些属性已被索引以供搜索。了解如何启用 Searchable 渲染提示。 :::

:::callout{theme="info"} 作为函数输入,对象集比对象数组更高效,因为它们会延迟加载直到需要时才加载。关于高效使用对象集的最佳实践,请参阅优化性能。 :::

对象搜索(Object search)

Objects.search() 接口允许您对项目中导入的任何对象类型发起搜索。在此示例中,函数使用给定的 airportCode 查找所有从该机场起飞的航班。然后,它找出这些航班的所有不同目的地并返回它们。

import { Function } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class FlightFunctions {
    @Function()
    public currentFlightDestinations(airportCode: string): Set<string> {
        const flightsFromAirport = Objects.search()
            .flights()
            .filter(flight => flight.departureAirportCode.exactMatch(airportCode))
            .all();

        const destinations = flightsFromAirport.map(flight => flight.arrivalAirportCode!);

        return new Set(destinations);
    }
}

对象集也可以通过传递对象列表、对象资源标识符列表或对象集资源标识符作为参数给被搜索的对象类型来创建。例如:Objects.search().flights([flight])

一旦您拥有给定类型的对象集,就可以对该集执行下文记录的各种操作。

过滤(Filtering)

对象集上的 .filter() 方法允许您根据对象的可搜索属性来过滤对象集。过滤方法接受一个过滤定义,该定义基于您正在过滤的属性的类型。

  • 所有属性类型都支持 .exactMatch() 过滤器,该过滤器将对象集过滤到与该属性值精确匹配的对象。这对于过滤字符串的精确匹配(如上例所示)或过滤对象的主键非常有用(例如,.filter(object => object.primaryKey.exactMatch(PrimaryKey)))。
  • 要检查属性是否为 nullundefined,请使用 hasProperty() 方法。
  • 要传递多个值,请使用展开运算符 .exactMatch(...listVariable)。如果传入空数组,则忽略该过滤器。
  • 字符串属性支持多个关键词过滤器。有关完整详情,请参阅 Code Assist 中每个方法的文档。
  • .phrase() 将搜索查询拆分为标记(通常是单个单词),然后根据这些值是否按顺序包含所有给定标记且中间没有其他标记来过滤值。请注意,由下划线或句点分隔的字符串值将被视为一个标记。例如,搜索"banana"时,属性值为"banana_pudding"或"banana.pudding"的对象将不会被返回。
  • .phrasePrefix()phrase() 几乎相同,但最后一个标记也会匹配以该标记开头的标记。例如,对 fresh banana 进行 .phrasePrefix() 搜索将匹配属性值 fresh banana_pudding,但不会匹配属性值 banana_pudding fresh。对 pudding 进行 .phrasePrefix() 搜索不会匹配属性值 banana_pudding
  • .prefixOnLastToken() 将搜索查询拆分为标记,然后根据这些值是否包含所有给定标记来过滤值,其中最后一个标记也会匹配以该标记开头的标记。例如,big app 将匹配 big apples 以及 apples from the big tree,但不会匹配 apples from the biggest tree
  • .matchAnyToken().fuzzyMatchAnyToken() 将搜索查询拆分为标记,然后根据这些值是否包含任何给定标记来过滤值。fuzzy 版本允许近似值匹配。
  • .matchAllTokens().fuzzyMatchAllTokens() 将搜索查询拆分为标记,然后根据这些值是否包含所有给定标记来过滤值。fuzzy 版本允许近似值匹配。
    • 模糊过滤器可以接受一个可选的 Fuzziness 参数,该参数从 @foundry/functions-api 导入。
    • 有关可用 Fuzziness 选项的解释,请参阅 ElasticSearch 文档 ↗。更多信息也可在下方找到。
  • 数字、日期和时间戳属性支持 .range() 过滤器。
  • 范围过滤器有一组 .lt().lte().gt()gte() 方法,用于执行小于、小于等于、大于、大于等于比较。
  • 布尔属性支持 .isTrue().isFalse() 过滤器。
  • 地理点属性支持 .withinDistanceOf().withinPolygon().withinBoundingBox() 过滤器。
  • 地理形状属性支持 .withinBoundingBox().intersectsBoundingBox().doesNotIntersectBoundingBox().withinPolygon().intersectsPolygon()doesNotIntersectPolygon() 过滤器。
  • 链接过滤器可用于使用 .isPresent() 方法过滤具有或不具有特定类型链接对象的对象。
  • 数组属性支持 .contains() 过滤器,该过滤器将对象集过滤到其数组属性值包含任何给定值的对象。

组合过滤器(Combining filters)

您可以使用从 @foundry/functions-api 导出的 Filters API 来组合过滤器。可用的方法有:

  • and() 将对象集过滤为通过所有给定过滤器的对象
  • or() 将对象集过滤为通过任何给定过滤器的对象
  • not() 对给定的过滤器取反

在下面的示例中,我们可以使用 and() 按航班目的地过滤航班对象集:

import { Filters } from "@foundry/functions-api";


Objects.search()
    .flights()
    .filter(flight => Filters.or(
        Filters.and(flight.destination.exactMatch("SFO"), flight.passengerCount.gt(100)),
        Filters.and(flight.destination.exactMatch("LAX"), flight.passengerCount.gt(300)),
    ))

上述代码将过滤出抵达 SFO 且乘客数超过 100 人,或抵达 LAX 且乘客数超过 300 人的航班。

:::callout{theme="warning" title="警告"} 对象集上的 .filter() 方法不使用运算符 &&||。要应用多个过滤器,您必须使用上面列出的 Filters 上的方法之一(或多次调用 .filter() 以实现 and 条件)。 :::

使用模糊搜索过滤字符串属性

指定可选的 fuzziness 参数可以对模糊匹配行为进行更精细的控制。如果您不指定模糊度,则会根据您正在搜索的标记长度自动允许编辑距离。您需要从 @foundry/functions-api 导入 Fuzziness 才能指定编辑距离。

模糊匹配任意标记(Fuzzy match any token)

Objects.search().employee().filter(employee => employee.firstName.fuzzyMatchAnyToken("Michael", { fuzziness: Fuzziness.LEVENSHTEIN_TWO })).all();

上述代码返回所有名字与提供的搜索词编辑距离在两次以内的员工(Levenshtein 距离为 2)。在此示例中,这将包括 MichaelMichealMikhaelMichelMikhailMihail(例如,不包括 Miguel)。如果您对搜索词的准确性更有把握,可以使用更小的编辑距离(不同的 Levenshtein 距离)进行搜索,从而进一步细化搜索结果。

模糊匹配所有标记(Fuzzy match all tokens)

Objects.search().employee().filter(employee => employee.fullName.fuzzyMatchAllTokens("Michael Smith", { fuzziness: Fuzziness.LEVENSHTEIN_ONE })).all();

您也可以在多个标记的短语上使用模糊过滤器。上述代码将匹配全名中同时包含 MichaelSmith 且每个标记最多允许一次编辑的员工——例如,Mikhael Smitt(即每个标记的 Levenshtein 距离均为 1)。使用 fuzzyMatchAllTokensfuzzyMatchesAllTokens 过滤器时,不考虑标记的顺序。

对字符串数组属性进行模糊匹配

所有基于数组属性的过滤器都可以使用其底层类型可用的方法。例如,字符串数组属性可以根据字符串属性可用的任何方法进行过滤,尽管方法的命名可能略有不同。对数组属性进行过滤时,只要数组元素中有一个匹配,该对象就会被返回。

周边搜索(Search Around)

:::callout{title="周边搜索限制"} 加载到内存中的对象集 .all().allAsync() 允许最多进行 3 次周边搜索。如果使用了超过 3 次周边搜索,则会抛出错误。在 Object Storage V2 中从对象集 A 执行周边搜索到对象集 B 时,生成的对象集 B 不能包含超过 1000 万个对象实例,否则将抛出错误。对于 Object Storage V1,限制为 10 万个对象实例。 :::

基于对象集的对象类型,会生成周边搜索方法,以便根据对象集的对象类型遍历链接。在下面的示例中,我们根据出发代码过滤到一个航班对象集,然后周边搜索到这些航班上的乘客。这将产生一个乘客对象集,可以进一步过滤或在其上执行周边搜索。

const passengersDepartingFromAirport = Objects.search()
    .flights()
    .filter(flight => flight.departureAirportCode.exactMatch(airportCode))
    .searchAroundPassengers();

仅当链接类型已导入到您的项目中时,才会生成周边搜索方法。有关如何导入链接类型的详细信息,请参阅教程

请注意,出于性能原因,单次搜索中可执行的周边搜索操作数量当前限制为 3。如果您尝试运行深度超过三层的周边搜索,搜索将在运行时失败。

K 近邻(KNN)

:::callout{title="KNN 限制"} KNN 仅支持索引到 OSv2 中的对象类型。k 值限制在 0 < K <= 100 范围内。此外,搜索向量必须与用于索引的向量大小相同,并且维度限制为 2048。如果超出这些限制中的任何一个,将抛出错误。 :::

具有嵌入属性的对象类型将可用于 KNN 搜索。这些搜索将返回 k 个对象,这些对象的嵌入属性最接近提供的嵌入参数。以下示例返回与提供的电影脚本最相似的电影。嵌入可以在转换工具(如 Pipeline Builder)中生成;也可以在函数查询时使用 Palantir 提供的嵌入模型您自己的模型在函数中生成。

确保您的函数仓库的 functions.json 配置文件中的 enableVectorProperties 条目设置为 true

import { Objects } from "@foundry/ontology-api";

const kValue: number = 2;
// 向量可以从 FML Live 生成,也可以来自现有对象
const vector: Double[] = [0.7, 0.1, 0.3];
const movies: Movies[] = Objects.search()
        .movies()
        .nearestNeighbors(obj => obj.vectorProperty.near(vector, { kValue }))
        .orderByRelevance()
        .take(kValue);

有关完整语义搜索工作流的示例,请查阅语义搜索工作流指南

集合操作(Set operations)

相同对象类型的对象集可以通过集合操作以多种方式组合:

  • .union() 创建一个由任何给定对象集中存在的对象组成的新对象集。
  • .intersect() 创建一个由所有给定对象集中都存在的对象组成的新对象集。
  • .subtract() 移除给定对象集中存在的任何对象。

检索所有对象(Retrieving all objects)

.all().allAsync() 方法检索对象集中的所有对象。请注意,如果您尝试一次加载过多对象,您的函数将无法执行。当前,您可以加载的最大对象数量为 100,000。但是,加载超过 10,000 个对象也可能导致您的函数执行超时。详细了解函数中的时间和空间限制。

您可以使用 .allAsync() 方法检索一个 Promise,该 Promise 解析为对象集中的所有对象。这对于并行加载来自多个对象集的数据非常有用。

排序与限制(Ordering and limiting)

您无需检索所有对象,而是可以通过对对象集应用排序子句,然后指定要加载的特定对象数量来加载有限数量的对象。为此,您可以使用以下方法:

  • .orderBy() 指定一个可搜索属性进行排序,并允许您指定排序方向。只有其类型可以排序的属性(数字、日期和字符串)才可用于此方法。您可以多次调用 .orderBy() 以按多个属性排序。
  • .orderByRelevance() 指定对象应按照与所提供过滤器的匹配程度排序,最相关的排在前面。查询词条与给定对象上属性值的相关性是一个复杂的判定过程,它考虑了词条在属性值中出现的频率、词条在所有对象中出现的频率等因素。当仅执行 .exactMatch() 过滤器或过滤非字符串属性时,相关性不太适用。请注意,在单次搜索中只能使用 .orderBy().orderByRelevance() 中的一个。
  • .take().takeAsync() 使您能够从集合中检索指定数量的对象。这些方法仅在您指定了排序后才可用。

例如,以下代码将检索最早开始日期的十名员工:

Objects.search()
    .employees()
    .orderBy(e => e.startDate.asc())
    .take(10)

再举一个例子,假设有一个对象类型 claims,其中包含一家保险公司的意外事故索赔文本。我们想找到一个涉及红色汽车和鹿的特定索赔。如果没有 .orderByRelevance() 这一行,任何包含单词 redcarcollisionwithdeer 的结果都可能出现在前 10 个结果中。有了 .orderByRelevance() 这一行,前 10 个结果将是包含最多搜索词条的索赔,因此最相关的索赔将首先出现。

const results = Objects.search()
    .claims()
    .filter(doc => doc.text.matchAnyToken("red car collision with deer"))
    .orderByRelevance()
    .take(10)

计算聚合(Computing aggregations)

:::callout{title="聚合限制"} 从 Objects API 返回的聚合限制为总共 10,000 个桶。如果超出此限制,将抛出错误。

当使用 .topValues() 进行分桶时,如果数据具有超过 1,000 个不同的值,结果将是近似的。在这种情况下,前几个值的列表可能不准确。 :::

按属性分组对象

在许多情况下,无需加载对象集中的所有对象。相反,您可以仅加载值的分桶聚合以进行进一步分析。

要开始计算聚合,请在对象集上调用 .groupBy() 方法。这允许您指定对对象集中对象类型的某个可搜索属性进行分桶。例如,以下代码按开始日期对员工进行分组:

Objects.search()
    .employees()
    .groupBy(e => e.startDate.byDays())

在指定要按哪个属性分桶时,您需要根据属性类型提供有关如何进行分桶的额外信息:

  • 对于 boolean 属性,唯一选项是 .topValues()。这将返回两个桶,一个用于 true,一个用于 false
  • 对于字符串属性,有两个选项:
  • .topValues():用于快速响应时间和基数较小的属性。这按字符串属性的前 1,000 个值进行分桶。此限制是为了确保返回的聚合不会过大。
  • .exactValues():用于更精确的聚合,并可能考虑高基数属性的多达 10,000 个桶。考虑的桶数量可以通过 .exactValues({"maxBuckets": numBuckets}) 指定,其中 numBuckets 必须是 0 到 10,000 之间的整数值。此方法的响应时间可能更长,因为需要考虑更多结果。
  • 对于数字属性(例如 IntegerLongFloatDouble),两个分桶选项是:
  • .byRanges() 允许您指定应使用的确切范围。例如,您可以使用 .byRanges({ min: 0, max: 50 }, { min: 50, max: 100 }) 将对象分桶到您在此处指定的两个范围 [0, 50] 和 [50, 100] 中。范围的 min 是包含的,max 是排除的。您可以省略 minmax,以表示一个包含从 -∞ 到 max 或从 min 到 ∞ 的值的桶。
  • .byFixedWidth() 指定每个桶的宽度。例如,您可以使用 .byFixedWidth(50) 将对象分桶到每个宽度为 50 的范围内。
  • 对于 LocalDate 属性,提供了多种便捷方法以便于分桶:
  • .byYear()
  • .byQuarter()
  • .byMonth()
  • .byWeek()
  • .byDays() 将值按天分桶。您可以传入一个天数作为桶宽度。
  • 对于 Timestamp 属性,适用与 LocalDate 相同的分桶选项,以及以下补充:
  • .byHours() 按小时分桶值。您可以传入一个小时数作为桶宽度。
  • .byMinutes() 按分钟分桶值。您可以传入一个分钟数作为桶宽度。
  • .bySeconds() 按秒分桶值。您可以传入一个秒数作为桶宽度。
  • 对于 Array 属性,分桶选项由数组中元素的类型决定。具体来说,您为 Array<PropertyType> 获得的分桶方法与为 PropertyType 获得的方法相同(例如,Array<boolean> 获得与 boolean 相同的分桶方法)。
  • 例如,如果您有一个名为 employeeSetArray<string>,包含 Alice 和 Bob,他们分别曾在 ["US", "UK"]["US"] 工作过。那么 employeeSet.groupBy(e => e.pastCountries.exactValue()).count() 将返回 { "US": 2, "UK": 1 }

按一个属性分组后,您可以选择调用 .segmentBy() 方法以进行进一步分桶。这允许您计算按两个可搜索属性分桶的三维聚合。例如,您可以按开始日期以及角色对员工进行分组,如下所示:

Objects.search()
    .employees()
    .groupBy(e => e.startDate.byDays())
    .segmentBy(e => e.role.topValues())

选择聚合指标

对对象集进行分组后,您可以调用各种聚合方法来计算每个桶的聚合指标。需要属性的方法仅接受标记为可搜索的属性。可能的聚合方法有:

  • .count() 仅返回每个桶中的对象数量
  • .average() 返回给定数字、时间戳、日期属性的平均值
  • .max() 返回给定数字、时间戳、日期属性的最大值
  • .min() 返回给定数字、时间戳、日期属性的最小值
  • .sum() 返回给定数字属性的值的总和
  • .cardinality() 返回给定属性的近似不同值数量

调用这些方法之一将返回一个 TwoDimensionalAggregationThreeDimensionalAggregation。如果您在调用最终聚合方法之前调用了 .segmentBy(),则会返回 ThreeDimensionalAggregation

详细了解这些聚合类型的结构,包括有效的分桶类型

请注意,生成的聚合被包装在一个 Promise 中,因为计算聚合需要从远程服务加载数据。您可以使用 async/await ↗ 语法来解包 Promise 结果。

以下是加载聚合并将其作为结果返回的完整示例。

import { Function, ThreeDimensionalAggregation } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class AggregationFunctions {
    @Function()
    public async employeesByRoleAndOffice(): Promise<ThreeDimensionalAggregation<string, string>> {
        return Objects.search()
            .employee()
            .groupBy(e => e.title.topValues())
            .segmentBy(e => e.office.topValues())
            .count();
    }
}

以下是没有 groupBy 语句的聚合完整示例:

import { Function } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";

export class AggregationFunctions {
    @Function()
    public async employeesStats(): Promise<Double> {
        // 所有员工计数,如果 count() 返回 undefined 则默认为 0
        return Objects.search().employee().count() ?? 0;
    }
}

您还可以通过替换上面代码示例中的相应行来执行其他无 groupBy 的聚合,例如:

  • 所有员工计数:Objects.search().employee().count();(如上例所示)
  • 员工平均任期:Objects.search().employee().average(e => e.tenure);
  • 员工最长任期:Objects.search().employee().max(e => e.tenure);
  • 员工最短任期:Objects.search().employee().min(e => e.tenure);
  • 所有员工薪资总和:Objects.search().employee().sum(e => e.salary);
  • 办公室数量:Objects.search().employee().cardinality(e => e.office);

有关在内存中操作聚合结果的示例,请尝试创建自定义聚合指南。