跳转至

Architecture(架构)

:::callout{theme="warning"} This page documents the architecture of the original Spark and Direct Read SQL engines. Foundry is transitioning to Furnace, a next-generation SQL engine with enhanced performance, flexibility, and compatibility. This transition is automatic and requires no changes to existing workflows. :::

External SQL connectivity and the BI Tool integrations in this section are powered by a service called Foundry SQL Server. This service provides lightweight SQL session and statement management for read-only queries against Foundry Datasets. Palantir provides JDBC and ODBC drivers to facilitate client interactions with this service using open standards, as well as plugin implementations for certain third-party platforms which leverage these drivers.

Supported SQL Dialects

Supported SQL dialects are ANSI, ODBC, and SparkSQL.

Note that support for these dialects is limited to read-only functionality.

Execution Engines

Foundry SQL Server will automatically select an execution engine based on the complexity of the query. Each execution engine comes with a set of tradeoffs for overall performance, result size limitations, and supported query complexity.

Spark Engine

The default execution engine for queries leverages Spark SQL functionality. This engine supports full SQL compute functionality such as aggregates, joins, order by, filters, etc. Queries which require use of this execution engine will be subject to limits regarding data scale, as results must be collected in memory on the Spark driver prior to delivering results to client applications. These limitations are a function of the number of rows and number of bytes in the result of the computation.

Direct Read Engine

When possible, Foundry SQL Server will use the direct read engine to execute queries. When queries do not require SQL compute Foundry SQL Server will bypass Spark SQL and stream records directly from the backing files of a Dataset. Direct read queries are not subject to the same scale limitations as queries which require full SQL compute.

Queries are direct read eligible when:

  1. Executed on a dataset. Views are not currently supported.
  2. The dataset files are in a supported format. The formats currently supported by direct read are Parquet, CSV, Avro, and Soho.
  3. The query does not require SQL compute. Queries which contain aggregate, filter, join, and order by predicates are not direct read eligible.
  4. The query does not select from a column with a type that is ineligible for direct read. array, map, and struct are not direct read eligible.

Direct read queries are case-sensitive.

Caveats

  • This functionality is intended to support clients external to the Foundry Platform such as PowerBI, Tableau, or other downstream Applications. For SQL-powered transformations within the Foundry Platform see SQL Transforms.
  • The architecture of Foundry SQL Server has been optimized for ad-hoc interactive queries against moderate data scale.

中文翻译


架构

:::callout{theme="warning"} 本文档介绍的是原有 Spark 和 Direct Read SQL 引擎的架构。Foundry 正在迁移至 Furnace,这是一款具备更强性能、更高灵活性和更好兼容性的下一代 SQL 引擎。该迁移过程为自动完成,无需对现有工作流进行任何更改。 :::

本节所述的外部 SQL 连接和 BI 工具集成由名为 Foundry SQL Server 的服务提供支持。该服务为针对 Foundry 数据集(Dataset)的只读查询提供轻量级 SQL 会话和语句管理。Palantir 提供 JDBC 和 ODBC 驱动程序,以便客户端使用开放标准与该服务进行交互,同时为某些利用这些驱动程序的第三方平台提供插件实现。

支持的 SQL 方言(SQL Dialect)

支持的 SQL 方言包括 ANSIODBCSparkSQL

请注意,对这些方言的支持仅限于只读功能。

执行引擎(Execution Engine)

Foundry SQL Server 会根据查询的复杂程度自动选择执行引擎。每个执行引擎在整体性能、结果集大小限制和所支持的查询复杂度方面各有优劣。

Spark 引擎(Spark Engine)

查询的默认执行引擎利用 Spark SQL 功能。该引擎支持完整的 SQL 计算功能,例如聚合(aggregate)、连接(join)、排序(order by)、过滤(filter)等。需要使用此执行引擎的查询将受到数据规模方面的限制,因为结果必须先收集到 Spark 驱动程序(driver)的内存中,然后再传递给客户端应用程序。这些限制取决于计算结果的行数和字节数。

直接读取引擎(Direct Read Engine)

在可能的情况下,Foundry SQL Server 会使用直接读取引擎来执行查询。当查询不需要 SQL 计算时,Foundry SQL Server 会绕过 Spark SQL,直接从数据集(Dataset)的底层文件中流式传输记录。直接读取查询不受与需要完整 SQL 计算的查询相同的规模限制。

符合直接读取条件的查询需满足以下要求:

  1. 在数据集上执行。目前不支持视图(View)
  2. 数据集文件采用支持的格式。目前直接读取支持的格式包括 Parquet、CSV、Avro 和 Soho。
  3. 查询不需要 SQL 计算。包含聚合(aggregate)、过滤(filter)、连接(join)和排序(order by)谓词的查询不符合直接读取条件。
  4. 查询未选择类型不符合直接读取条件的列。arraymapstruct 类型不符合直接读取条件。

直接读取查询区分大小写。

注意事项

  • 此功能旨在支持 Foundry 平台外部的客户端,例如 PowerBI、Tableau 或其他下游应用程序。如需在 Foundry 平台内进行基于 SQL 的数据转换,请参阅 SQL 转换(SQL Transform)
  • Foundry SQL Server 的架构已针对中等数据规模下的临时交互式查询进行了优化。