Ontology SQL(Ontology SQL(本体SQL))¶
:::callout{theme="neutral" title="Beta"} Ontology SQL is in the beta phase of development and some features may not be available on your environment. Functionality may change during active development. Contact Palantir support to request access to Beta features. :::
Ontology SQL is Foundry's SQL query engine for ontology objects, purpose-built to bring familiar SQL semantics to your ontology data. Ontology SQL lets you run SQL queries directly against object types, many-to-many links, and interfaces. With Ontology SQL, you can query ontology objects using standard Spark SQL syntax, executing directly against object storage. Ontology SQL is built on Foundry's SQL infrastructure, providing a single SELECT statement interface that supports variables, standard SQL functions, and seamless integration with your existing ontology modeling.
Ontology SQL architecture¶
Ontology SQL provides a unified SQL experience for querying ontology objects, sharing the same SQL dialect and front-end experience as Foundry's other SQL offerings while operating with its own specialized backend.
- Unified SQL dialect: Ontology SQL shares the same SQL parsing layer and ANSI-compliant Spark SQL dialect as Furnace, ensuring consistent syntax and behavior across Foundry's SQL ecosystem. This provides a familiar SQL experience, while the underlying query planning and optimization are tailored specifically for ontology data.
- Ontology data providers: Ontology SQL introduces specialized providers for object types and many-to-many link tables. These providers translate ontology concepts such as object properties, primary keys, and link relationships into SQL-compatible table schemas, enabling seamless querying of ontology data using standard SQL syntax.
- Direct object storage access: Queries execute directly against Foundry's object storage layer, the same underlying storage used by object sets. This direct access ensures that performance and cost characteristics remain consistent with other ontology operations, with no intermediate data movement required.
- Compute engine abstraction: After planning, the object storage service (OSS) determines which compute engine should execute the query. The compute engine is responsible for running the query, reading data, and returning results, with automatic resource management and built-in scale constraints to ensure reliable performance.
- Consistent API experience: Ontology SQL provides a similar API surface to Furnace, delivering a consistent developer experience and enabling future extensibility as the platform evolves.
- Supported compute engines: Ontology SQL is built on Foundry's compute engine abstraction, allowing for future optimization and updates as the platform evolves. Additionally, to leverage larger and more complex queries, some queries in Ontology SQL can be executed in Spark ↗.
Getting started with Ontology SQL¶
Access Ontology SQL¶
You can interact with Ontology SQL in object mode in SQL Studio or SQL Console. Once in object mode, you will be able to query your object types using standard SQL syntax. For a complete reference of available operations, refer to the SQL dialect documentation.
Write your first query¶
Ontology SQL queries reference object types using their resource identifiers (RIDs). The basic syntax follows this pattern:
SELECT * FROM `ri.ontology.main.object-type.<object-type-rid>`;
For example, a query for all properties from an Employee object type would be written as follows:
SELECT * FROM `ri.ontology.main.object-type.00000000-0000-0000-0000-000000000000`;
A query for many-to-many links can be written as shown below:
SELECT * FROM `ri.ontology.main.relation.<relation-rid>`;
Understanding column names¶
All object properties are referenced in queries using their API names, not their display names. The columns available in your query correspond directly to the property API names defined in your ontology. You can find property API names in Ontology Manager.
For example, if an Employee object has properties with API names employeeId, firstName, and, department, you would write a query as follows:
SELECT employeeId, firstName, department
FROM ri.ontology.main.object-type.<employee-rid>;
Working with link types¶
One-to-one and one-to-many links¶
One-to-one and one-to-many link types do not need to be defined in Ontology Manager to be used in Ontology SQL. You can use SQL to join objects however you wish. Performance will be identical whether or not the join uses a link defined in Ontology Manager.
Many-to-many links¶
Many-to-many links require join tables and must be defined in Ontology Manager. To use them in your SQL joins, you must select the link type in the SQL query.
When querying many-to-many link tables, column names follow a specific convention to avoid ambiguity.
Each foreign key column is named using the pattern below:
<objectTypeApiName>_<relationApiName>
Consider a many-to-many relationship between Person and Car objects.
The Ontology setup is as follows:
- Object type:
Person(API name:person) - Primary key:
personId - Link API name on
Personside:vehicles - Object type:
Car(API name:car) - Primary key:
carId - Link API name on
Carside:drivers - Link type rid:
ri.ontology.main.relation.0
Below is an example query to find all cars driven by a specific person:
SELECT c.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN `car` AS c
ON c.`carId` = linkTable.`person_vehicles`
WHERE linkTable.`car_drivers` = 'person-123';
And an example query to find all people who drive a specific car:
SELECT p.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN `person` AS p
ON p.`personId` = linkTable.`car_drivers`
WHERE linkTable.`person_vehicles` = 'car-456';
:::callout{theme="neutral" title="Best practice"}
Always use table aliases when working with many-to-many links. Without aliases, column names can become difficult to read and maintain. Compare the following examples:
With aliases (recommended):
SELECT c.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN car AS c ON c.carId = linkTable.person_vehicles
WHERE linkTable.car_drivers = 'person-123';
Without aliases (harder to read):
SELECT car.*
FROM `ri.ontology.main.relation.0`
INNER JOIN car ON car.carId = `ri.ontology.main.relation.0`.person_vehicles
WHERE `ri.ontology.main.relation.0`.car_drivers = 'person-123';
Using variables¶
Ontology SQL supports variables through the DECLARE statement:
DECLARE @minSalary DOUBLE = 75000.0, @department STRING = 'Engineering';
SELECT employeeId, salary
FROM ri.ontology.main.object-type.<employee-rid>
WHERE salary > @minSalary AND department = @department;
Variable requirements¶
- The
DECLAREstatement must be the first statement in your query, and can only be used once. Reference the example above for how to declare multiple variables. - For supported variable types, refer to the SQL dialect documentation.
- Variable names can contain letters, digits, and underscores.
SETstatements are not allowed.
Reserved keywords¶
Some keywords, such as user and result, are reserved. It is best practice to wrap column names in backticks to avoid conflicts as shown below:
SELECT `user`, `result` FROM ri.ontology.main.object-type.<object-rid>;
Best practices for performance¶
Avoid functions in filter conditions¶
When applying functions or transformations to columns in WHERE clauses, or when using computed columns for filtering, the query engine cannot leverage indices. This severely limits query performance and the scale of computation.
Avoid the following patterns that prevent index usage:
-- Filtering on a computed column
SELECT employeeId, salary, bonus
FROM employee
WHERE salary + bonus > 100000;
-- Filtering on a function result
SELECT employeeId, hireDate
FROM employee
WHERE YEAR(hireDate) = 2023;
Prefer the following patterns to allow index usage:
-- Filter on the original column directly
SELECT employeeId, salary, bonus
FROM employee
WHERE salary > 100000 - bonus;
-- Use date range comparison instead of extracting year
SELECT employeeId, hireDate
FROM employee
WHERE hireDate >= '2023-01-01' AND hireDate < '2024-01-01';
The same principle applies to aggregations. Filtering over columns that were transformed by any part of the query (for example, colA + colB AS newColumn) means the engine cannot use indices to efficiently locate matching rows, resulting in slower performance.
Avoid selecting unnecessary large columns¶
Do not select large columns like vectors (embeddings) from queries when not needed. These columns can significantly increase query execution time and memory usage without providing value if they are not required for your analysis.
Limitations and constraints¶
Query structure limitations¶
SELECTqueries only: Ontology SQL exclusively supports read operations. No data modification (INSERT,UPDATE,DELETE) or schema definition (CREATE,ALTER,DROP) operations are available.- Single
SELECTstatement: Only oneSELECTstatement is allowed per query, with the exception of theDECLAREstatement for variables. - No mixing of objects and datasets: Queries must operate entirely on ontology objects or entirely on datasets. Mixing the two in a single query is not supported.
Data type constraints¶
- Struct columns must be excluded from queries. Including a struct-type column will cause the query to fail.
- Vector columns (embeddings) are technically supported, but are often excluded by front-end applications due to their size and limited utility in SQL contexts.
- All other ontology column types are supported and automatically converted to SQL-compatible types.
Scale and performance constraints¶
- Output limit: The Ontology SQL service truncates results at 10,000 rows maximum. Even if no limit is provided in the query, the result will be capped to 10,000 rows.
OFFSETis supported as long asOFFSET+LIMITis less than or equal to 10,000. For example,OFFSET 1000 LIMIT 1000works, butOFFSET 9000 LIMIT 2000does not. If interacting with Ontology SQL using applications, those respective applications may enforce stricter limits.
Resource constraints¶
- Jobs may be queued for up to 6 seconds during OSS Spark resource saturation.
- Query execution timeout is 20 seconds.
- Exceeding these durations will result in a
NotEnoughSparkResourceserror.
Platform constraints¶
- Object storage v2 only: Ontology SQL is currently available exclusively on object storage v2.
- No branch or scenario support: Queries execute against the main branch only. Support for branches and scenarios is planned for future releases.
- Object set limitations:
- Object sets can be used as query inputs, depending on the front-end application.
- Object sets cannot be directly used as query outputs.
Supported ontology features¶
| Feature | Support status |
|---|---|
| Object types | Supported; object type must be defined in Ontology Manager |
| Many-to-many links | Supported; link type must be defined in Ontology Manager |
| Interfaces | Not supported |
| One-to-one and one-to-many links | Supported; can be used directly in Ontology SQL as defining the link type in Ontology Manager is not required |
| Branching | Not supported |
| Scenarios | Not supported |
| Write operations | Not supported |
中文翻译¶
Ontology SQL(本体SQL)¶
:::callout{theme="neutral" title="Beta"} Ontology SQL 目前处于 beta 开发阶段,部分功能可能在您的环境中不可用。在活跃开发期间,功能可能会发生变化。请联系 Palantir 支持团队以申请访问 Beta 功能。 :::
Ontology SQL 是 Foundry 针对本体对象(ontology objects)设计的 SQL 查询引擎,旨在将熟悉的 SQL 语义引入您的本体数据。Ontology SQL 允许您直接对对象类型(object types)、多对多链接(many-to-many links)和接口(interfaces)运行 SQL 查询。借助 Ontology SQL,您可以使用标准 Spark SQL 语法查询本体对象,并直接对对象存储(object storage)执行查询。Ontology SQL 构建于 Foundry 的 SQL 基础设施之上,提供单一的 SELECT 语句接口,支持变量、标准 SQL 函数,并能与您现有的本体建模无缝集成。
Ontology SQL 架构¶
Ontology SQL 为查询本体对象提供了统一的 SQL 体验,与 Foundry 的其他 SQL 产品共享相同的 SQL 方言和前端体验,同时运行在自身专用的后端之上。
- 统一的 SQL 方言: Ontology SQL 与 Furnace 共享相同的 SQL 解析层和符合 ANSI 标准的 Spark SQL 方言,确保了 Foundry SQL 生态系统中语法和行为的一致性。这提供了熟悉的 SQL 体验,而底层的查询规划和优化则专门针对本体数据进行了定制。
- 本体数据提供程序: Ontology SQL 引入了针对对象类型和多对多链接表的专用提供程序。这些提供程序将本体概念(如对象属性、主键和链接关系)转换为兼容 SQL 的表模式,从而能够使用标准 SQL 语法无缝查询本体数据。
- 直接对象存储访问: 查询直接针对 Foundry 的对象存储层执行,该存储层与对象集(object sets)使用的底层存储相同。这种直接访问确保了性能和成本特性与其他本体操作保持一致,无需中间数据移动。
- 计算引擎抽象: 在规划之后,对象存储服务(OSS)会决定由哪个计算引擎执行查询。计算引擎负责运行查询、读取数据并返回结果,具有自动资源管理和内置的规模约束,以确保可靠的性能。
- 一致的 API 体验: Ontology SQL 提供了与 Furnace 类似的 API 接口,提供了一致的开发者体验,并为平台未来的可扩展性奠定了基础。
- 支持的计算引擎: Ontology SQL 构建于 Foundry 的计算引擎抽象之上,允许随着平台的发展进行未来的优化和更新。此外,为了利用更大、更复杂的查询,Ontology SQL 中的某些查询可以在 Spark ↗ 中执行。
Ontology SQL 入门¶
访问 Ontology SQL¶
您可以在 SQL Studio 或 SQL Console 的对象模式(object mode)中与 Ontology SQL 进行交互。进入对象模式后,您将能够使用标准 SQL 语法查询您的对象类型。有关可用操作的完整参考,请参阅 SQL 方言文档。
编写您的第一个查询¶
Ontology SQL 查询使用资源标识符(RID)来引用对象类型。基本语法遵循以下模式:
SELECT * FROM `ri.ontology.main.object-type.<object-type-rid>`;
例如,从 Employee 对象类型查询所有属性的查询可以编写如下:
SELECT * FROM `ri.ontology.main.object-type.00000000-0000-0000-0000-000000000000`;
查询多对多链接可以如下所示:
SELECT * FROM `ri.ontology.main.relation.<relation-rid>`;
理解列名¶
查询中引用的所有对象属性都使用其 API 名称,而不是显示名称。查询中可用的列直接对应于本体中定义的属性 API 名称。您可以在 Ontology Manager 中找到属性 API 名称。
例如,如果一个 Employee 对象具有 API 名称为 employeeId、firstName 和 department 的属性,您可以编写如下查询:
SELECT employeeId, firstName, department
FROM ri.ontology.main.object-type.<employee-rid>;
使用链接类型¶
一对一和一对多链接¶
一对一和一对多链接类型无需在 Ontology Manager 中定义即可在 Ontology SQL 中使用。您可以根据需要使用 SQL 连接对象。无论连接是否使用 Ontology Manager 中定义的链接,性能都是相同的。
多对多链接¶
多对多链接需要连接表,并且必须在 Ontology Manager 中定义。要在 SQL 连接中使用它们,您必须在 SQL 查询中选择该链接类型。
查询多对多链接表时,列名遵循特定约定以避免歧义。
每个外键列使用以下模式命名:
<objectTypeApiName>_<relationApiName>
考虑 Person 和 Car 对象之间的多对多关系。
本体设置如下:
- 对象类型:
Person(API 名称:person) - 主键:
personId Person侧的链接 API 名称:vehicles- 对象类型:
Car(API 名称:car) - 主键:
carId Car侧的链接 API 名称:drivers- 链接类型 rid:
ri.ontology.main.relation.0
以下是查找特定人员驾驶的所有汽车的示例查询:
SELECT c.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN `car` AS c
ON c.`carId` = linkTable.`person_vehicles`
WHERE linkTable.`car_drivers` = 'person-123';
以及查找驾驶特定汽车的所有人员的示例查询:
SELECT p.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN `person` AS p
ON p.`personId` = linkTable.`car_drivers`
WHERE linkTable.`person_vehicles` = 'car-456';
:::callout{theme="neutral" title="最佳实践"}
在处理多对多链接时,始终使用表别名。不使用别名会使列名难以阅读和维护。比较以下示例:
使用别名(推荐):
SELECT c.*
FROM `ri.ontology.main.relation.0` AS linkTable
INNER JOIN car AS c ON c.carId = linkTable.person_vehicles
WHERE linkTable.car_drivers = 'person-123';
不使用别名(较难阅读):
SELECT car.*
FROM `ri.ontology.main.relation.0`
INNER JOIN car ON car.carId = `ri.ontology.main.relation.0`.person_vehicles
WHERE `ri.ontology.main.relation.0`.car_drivers = 'person-123';
使用变量¶
Ontology SQL 通过 DECLARE 语句支持变量:
DECLARE @minSalary DOUBLE = 75000.0, @department STRING = 'Engineering';
SELECT employeeId, salary
FROM ri.ontology.main.object-type.<employee-rid>
WHERE salary > @minSalary AND department = @department;
变量要求¶
DECLARE语句必须是查询中的第一条语句,并且只能使用一次。请参考上面的示例了解如何声明多个变量。- 有关支持的变量类型,请参阅 SQL 方言文档。
- 变量名可以包含字母、数字和下划线。
- 不允许使用
SET语句。
保留关键字¶
某些关键字(例如 user 和 result)是保留的。最佳实践是将列名用反引号括起来以避免冲突,如下所示:
SELECT `user`, `result` FROM ri.ontology.main.object-type.<object-rid>;
性能最佳实践¶
避免在过滤条件中使用函数¶
在 WHERE 子句中对列应用函数或转换,或者使用计算列进行过滤时,查询引擎无法利用索引。这会严重限制查询性能和计算规模。
避免以下阻止索引使用的模式:
-- 对计算列进行过滤
SELECT employeeId, salary, bonus
FROM employee
WHERE salary + bonus > 100000;
-- 对函数结果进行过滤
SELECT employeeId, hireDate
FROM employee
WHERE YEAR(hireDate) = 2023;
优先使用以下允许索引使用的模式:
-- 直接对原始列进行过滤
SELECT employeeId, salary, bonus
FROM employee
WHERE salary > 100000 - bonus;
-- 使用日期范围比较代替提取年份
SELECT employeeId, hireDate
FROM employee
WHERE hireDate >= '2023-01-01' AND hireDate < '2024-01-01';
同样的原则也适用于聚合。对查询中任何部分转换过的列(例如 colA + colB AS newColumn)进行过滤,意味着引擎无法使用索引来高效定位匹配行,从而导致性能下降。
避免选择不必要的大列¶
如果不需要,请不要从查询中选择大列,例如向量(嵌入)。这些列会显著增加查询执行时间和内存使用量,如果您的分析不需要它们,则不会提供任何价值。
限制与约束¶
查询结构限制¶
- 仅支持
SELECT查询: Ontology SQL 仅支持读取操作。不支持数据修改(INSERT、UPDATE、DELETE)或模式定义(CREATE、ALTER、DROP)操作。 - 单条
SELECT语句: 每个查询只允许一条SELECT语句,用于变量的DECLARE语句除外。 - 不允许混合对象和数据集: 查询必须完全针对本体对象或完全针对数据集。不支持在单个查询中混合两者。
数据类型约束¶
- Struct 列 必须从查询中排除。包含 struct 类型的列会导致查询失败。
- Vector 列(嵌入)在技术上受支持,但由于其大小和在 SQL 上下文中的实用性有限,通常会被前端应用程序排除。
- 所有其他本体列类型 均受支持,并会自动转换为兼容 SQL 的类型。
规模与性能约束¶
- 输出限制: Ontology SQL 服务会将结果截断至最多 10,000 行。即使查询中未提供限制,结果也会被限制为 10,000 行。只要
OFFSET+LIMIT小于或等于 10,000,就支持OFFSET。例如,OFFSET 1000 LIMIT 1000有效,但OFFSET 9000 LIMIT 2000无效。如果通过应用程序与 Ontology SQL 交互,这些应用程序可能会强制执行更严格的限制。
资源约束¶
- 在 OSS Spark 资源饱和期间,作业可能会排队最多 6 秒。
- 查询执行超时时间为 20 秒。
- 超过这些持续时间将导致
NotEnoughSparkResources错误。
平台约束¶
- 仅限对象存储 v2: Ontology SQL 当前仅在对象存储 v2 上可用。
- 不支持分支或场景: 查询仅针对主分支执行。计划在未来版本中支持分支和场景。
- 对象集限制:
- 根据前端应用程序的不同,对象集可用作查询输入。
- 对象集不能直接用作查询输出。
支持的本体功能¶
| 功能 | 支持状态 |
|---|---|
| 对象类型 | 支持;对象类型必须在 Ontology Manager 中定义 |
| 多对多链接 | 支持;链接类型必须在 Ontology Manager 中定义 |
| 接口 | 不支持 |
| 一对一和一对多链接 | 支持;可以直接在 Ontology SQL 中使用,无需在 Ontology Manager 中定义链接类型 |
| 分支 | 不支持 |
| 场景 | 不支持 |
| 写入操作 | 不支持 |