跳转至

Spark SQL

The Spark SQL connector is a Palantir-provided driver for Spark SQL.

To create a new Spark SQL source, follow the standard setup flow for Palantir-provided drivers, then use the sections below for Spark SQL-specific configuration and networking. For the complete property reference, see the official Spark SQL driver documentation ↗.

Configuration

The properties below are mandatory or recommended.

Property Required? Description Default
AuthScheme Mandatory The authentication scheme used. Accepted entries are Plain, LDAP, NOSASL, and Kerberos. Plain
Server Mandatory The host name or IP address of the server hosting the SparkSQL database. {serverAddress}
UseSSL Mandatory Specifies whether to use SSL Encryption when connecting to Hive. TRUE
Database Recommended The name of the SparkSQL database.
Password Recommended The password used to authenticate with SparkSQL.
Port Recommended The port for the SparkSQL database. 10000
TransportMode Recommended The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP. BINARY
User Recommended The username used to authenticate with SparkSQL.

Networking

The table below lists the domains that the source needs to be able to access in order to successfully run.

For each domain, add a corresponding egress policy. If the source is hosted on-premises and not directly reachable from Foundry, use an agent proxy egress policy instead; the agent host itself must also be able to reach the listed domains. See using an agent as a proxy for details.

Domain Required
\:\ Always. Server and Port connection properties; default Port=10000
\:88 If AuthScheme=Kerberos
\:88 If AuthScheme=Kerberos and Kerberos topology uses multiple realms

中文翻译

Spark SQL

Spark SQL 连接器(connector)是一个由 Palantir 提供的驱动程序(driver),用于连接 Spark SQL。

要创建新的 Spark SQL 数据源(source),请遵循 Palantir 提供驱动程序的标准设置流程,然后使用以下各节进行 Spark SQL 特定的配置和网络设置。有关完整的属性参考,请参阅 Spark SQL 驱动程序官方文档 ↗

配置

以下属性为必填或推荐配置。

属性 是否必填 描述 默认值
AuthScheme 必填 使用的认证方案。可接受的值为 Plain、LDAP、NOSASL 和 Kerberos。 Plain
Server 必填 托管 SparkSQL 数据库的服务器主机名或 IP 地址。 {serverAddress}
UseSSL 必填 指定连接到 Hive 时是否使用 SSL 加密。 TRUE
Database 推荐 SparkSQL 数据库的名称。
Password 推荐 用于认证 SparkSQL 的密码。
Port 推荐 SparkSQL 数据库的端口。 10000
TransportMode 推荐 与 Hive 服务器通信时使用的传输模式。可接受的值为 BINARY 和 HTTP。 BINARY
User 推荐 用于认证 SparkSQL 的用户名。

网络配置

下表列出了数据源成功运行所需访问的域名。

对于每个域名,请添加相应的出站策略(egress policy)。如果数据源部署在本地且无法直接从 Foundry 访问,请改用代理出站策略(agent proxy egress policy);代理主机本身也必须能够访问列出的域名。详情请参阅使用代理作为代理服务器

域名 必需条件
\:\ 始终需要。Server 和 Port 连接属性;默认 Port=10000
\:88 AuthScheme=Kerberos
\:88 AuthScheme=Kerberos 且 Kerberos 拓扑使用多个域(realm)时