Spark SQL¶
The Spark SQL connector is a Palantir-provided driver for Spark SQL.
To create a new Spark SQL source, follow the standard setup flow for Palantir-provided drivers, then use the sections below for Spark SQL-specific configuration and networking. For the complete property reference, see the official Spark SQL driver documentation ↗.
Configuration¶
The properties below are mandatory or recommended.
| Property | Required? | Description | Default |
|---|---|---|---|
AuthScheme ↗ |
Mandatory | The authentication scheme used. Accepted entries are Plain, LDAP, NOSASL, and Kerberos. | Plain |
Server ↗ |
Mandatory | The host name or IP address of the server hosting the SparkSQL database. | {serverAddress} |
UseSSL ↗ |
Mandatory | Specifies whether to use SSL Encryption when connecting to Hive. | TRUE |
Database ↗ |
Recommended | The name of the SparkSQL database. | — |
Password ↗ |
Recommended | The password used to authenticate with SparkSQL. | — |
Port ↗ |
Recommended | The port for the SparkSQL database. | 10000 |
TransportMode ↗ |
Recommended | The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP. | BINARY |
User ↗ |
Recommended | The username used to authenticate with SparkSQL. | — |
Networking¶
The table below lists the domains that the source needs to be able to access in order to successfully run.
For each domain, add a corresponding egress policy. If the source is hosted on-premises and not directly reachable from Foundry, use an agent proxy egress policy instead; the agent host itself must also be able to reach the listed domains. See using an agent as a proxy for details.
| Domain | Required |
|---|---|
| \ |
Always. Server and Port connection properties; default Port=10000 |
| \ |
If AuthScheme=Kerberos |
| \ |
If AuthScheme=Kerberos and Kerberos topology uses multiple realms |
中文翻译¶
Spark SQL¶
Spark SQL 连接器(connector)是一个由 Palantir 提供的驱动程序(driver),用于连接 Spark SQL。
要创建新的 Spark SQL 数据源(source),请遵循 Palantir 提供驱动程序的标准设置流程,然后使用以下各节进行 Spark SQL 特定的配置和网络设置。有关完整的属性参考,请参阅 Spark SQL 驱动程序官方文档 ↗。
配置¶
以下属性为必填或推荐配置。
| 属性 | 是否必填 | 描述 | 默认值 |
|---|---|---|---|
AuthScheme ↗ |
必填 | 使用的认证方案。可接受的值为 Plain、LDAP、NOSASL 和 Kerberos。 | Plain |
Server ↗ |
必填 | 托管 SparkSQL 数据库的服务器主机名或 IP 地址。 | {serverAddress} |
UseSSL ↗ |
必填 | 指定连接到 Hive 时是否使用 SSL 加密。 | TRUE |
Database ↗ |
推荐 | SparkSQL 数据库的名称。 | — |
Password ↗ |
推荐 | 用于认证 SparkSQL 的密码。 | — |
Port ↗ |
推荐 | SparkSQL 数据库的端口。 | 10000 |
TransportMode ↗ |
推荐 | 与 Hive 服务器通信时使用的传输模式。可接受的值为 BINARY 和 HTTP。 | BINARY |
User ↗ |
推荐 | 用于认证 SparkSQL 的用户名。 | — |
网络配置¶
下表列出了数据源成功运行所需访问的域名。
对于每个域名,请添加相应的出站策略(egress policy)。如果数据源部署在本地且无法直接从 Foundry 访问,请改用代理出站策略(agent proxy egress policy);代理主机本身也必须能够访问列出的域名。详情请参阅使用代理作为代理服务器。
| 域名 | 必需条件 |
|---|---|
| \ |
始终需要。Server 和 Port 连接属性;默认 Port=10000 |
| \ |
当 AuthScheme=Kerberos 时 |
| \ |
当 AuthScheme=Kerberos 且 Kerberos 拓扑使用多个域(realm)时 |