FTP/FTPS¶
Connect Foundry to FTP and FTPS servers to sync data between folders and Foundry datasets.
Supported capabilities¶
| Capability | Status |
|---|---|
| Exploration | 🟢 Generally available |
| Bulk import | 🟢 Generally available |
| Incremental | 🟢 Generally available |
Data model¶
The connector can transfer files of any type into Foundry datasets. File formats are preserved and no schemas are applied during or after the transfer. Apply any necessary schema to the output dataset, or write a downstream transformation to access the data.
Performance and limitations¶
There is no limit to the size of transferable files. However, network issues can result in failures of large-scale transfers. In particular, syncs running on a Foundry worker that take more than two days to run will be interrupted. To avoid network issues, we recommend using smaller file sizes and limiting the number of files that are ingested in every execution of the sync. Syncs can be scheduled to run frequently.
Setup¶
- Open the Data Connection application and select + New Source in the upper right corner of the screen.
- Select FTP/FTPS from the available connector types.
- Follow the additional configuration prompts to continue the setup of your connector using the information in the sections below.
:::callout{theme="neutral"} To access on-premise FTP servers, we recommend you connect with an agent proxy policy. :::
Learn more about setting up a connector in Foundry.
Authentication¶
FTP/FTPS authentication is completed using username and password. The FTP/FTPS connection may fail if the FTP user does not have permission for the root directory configured for the connection. Refer to Configuration options for more details about the root directory.
| Option | Required? | Description |
|---|---|---|
Username |
Yes | The FTP login username. |
Password |
Yes | The FTP login password. This field can be left empty value for anonymous logins. Contact your server administrator for more information. |
Networking¶
If your FTP/FTPS connection runs in Foundry, you must add a network egress policy to allowlist the connection.
If connecting through a domain name, egress policies should be created for the FTP server domain on both the control port (usually port 21) and the data ports. We recommend creating two network policies: a single port policy for the control port, and a port range policy for the data ports. Data ports are determined by the administrators of the FTP server. If errors continue to occur despite proper egress policy configuration, file an issue quoting the list of policies applied.
:::callout{theme="warning"}
If the domain for the server resolves to multiple domains and/or servers, all of the associated domains and their related IPs need to be whitelisted. To verify whether a server resolves to multiple domains and/or servers, run the command dig <domain> from your terminal for the server you are trying to connect to and review the answer section.
:::
Certificates and private keys¶
Configure additional client or server certificates and private keys to properly set up your connector, using the guidance below.
SSL and hostname validation¶
SSL connections validate servers certificates. Normally, SSL validations happen through a certificate chain; by default, both agent and Foundry workers trust most industry standard certificate chains. If the server to which you are connecting has a self-signed certificate, or if a firewall performs TLS interception on the connection, the connector must trust the certificate. Learn more about using certificates in Data Connection.
:::callout{theme="warning"}
The server must provide the full certificate chain in order for SSL verification to work. The certificate chain for the FTP server can be obtained by running the command openssl s_client -connect {hostname}:{port} -showcerts -starttls ftp. To verify the certificate chain, use the OpenSSL command line utility or any other available tool.
:::
:::callout{theme="neutral"} If using FTPS, ensure that the certificate for the FTPS server has been added to the agent's truststore. :::
:::callout{theme="neutral"} Foundry attempts a validation for all egress routes. However, FTP cannot be inspected, resulting in hanging connections and/or timeout errors. If errors continue to occur despite proper egress policy configuration, report an issue with a list of policies for which you want to disable hostname validation. :::
Implicit/explicit SSL¶
FTP servers can be configured to support either explicit or implicit SSL. Servers running on port 990 will generally be using implicit SSL.
Confirm the settings of your server with your server administrator. By default, the connector assumes explicit SSL; you may need to change this setting for your environment.
Connection requirements¶
FTP requiresCONTROL and DATA connection types. The DATA connection must be configured to be in ACTIVE or PASSIVE mode.
- CONTROL: Client to server
- DATA: Data selected from a range (for example, 1024–1123)
- PASSIVE: Client to server
- ACTIVE: Server to client
- Only works with agent connections
Default FTP/FTPS connector ports:
- FTP: 21
- FTPS Explicit: 21
- FTPS Implicit: 990
Active and passive mode¶
We recommend using a passive mode networking connection. In passive mode, all connections are initiated by the client. When using passive mode, ensure the control port (typically 21) and port range for data transfer (for example 1024–1123) is allowlisted. Contact your FTP/FTPS server administrator to obtain the connection details.
Active mode is an older method of establishing a file transfer. In active mode, the client connects to server while the server connects to the client. Both the server and client are dependent on each other and require bidirectional network connectivity. This networking method is generally difficult to achieve in most secure environments and is only possible when using agent connections.
Configuration options¶
| Option | Required? | Default | Description |
|---|---|---|---|
URL |
Yes | The URL of the FTP/FTPS server. The URL can optionally contain the path to a directory on the server which will be used as the root directory for the connection (for example, ftp://server.name/folder/name). |
|
Configure client certificates and private key |
No | See Certifications and private keys for more information. | |
Configure server certificates |
No | See Certifications and private keys for more information. | |
Connection timeout |
No | 30 seconds | Increase timeout in milliseconds. |
Re-login time |
No | 15 minutes | Modify interval in minutes. |
File change timeout |
No | 2 seconds | Set the amount of time a file must remain constant before being considered for upload. Timeout in milliseconds. |
HTTP proxy URL |
No | URL of the proxy server beginning with http:// or https://. Support for HTTP proxies is highly dependent on the FTP server in use and cannot be used in ACTIVE mode. This is because HTTP proxies do not support client requests to listen on an externally accessible port. ACTIVE mode transfers involve the FTP server connecting back to the client, and this is not possible via an HTTP proxy. |
|
SSL method |
No | EXPLICIT | Whether to use explicit or implicit SSL for FTPS connection. |
Mode |
No | PASSIVE |
PASSIVE or ACTIVE |
Time zone |
No | Timezone of the connector | Timezone of the FTP server. FTP records timestamps without a timezone. To view accurate modification timestamps, specify the FTP server timezone if it is different than the default. |
Timestamp format string |
No | MM-dd-yy hh:mma |
A format string to parse timestamps from the FTP server. Timestamps are used to determine the files that were modified since the last sync. See Java documentation ↗ on supported formats. |
Control encoding |
No | US-ASCII | The encoding for the FTP control messages. Control encoding can be necessary if filenames are in a different encoding than the data connection server default filesystem encoding. Example: On a Windows FTP server, windows-31j is often used for Japanese, and x-windows-949 is often used for Korean. See the Java documentation ↗ for more information. |
Keep alive |
No | false |
Choose whether to send FTP NOOP commands to keep the control connection alive while downloading large file. Not supported by all FTP servers. |
Sync data from FTP¶
The FTP connector uses the file-based sync interface.
Troubleshooting¶
Agent connections¶
-
Are you having issues setting up an agent connection? Install an FTP/S client and attempt to connect to the server using the same configuration as that of the source. If this connection fails, the issue is not a connector bug. Investigate network connectivity, authentication, and FTP server configurations before filing an issue.
-
Are you using an egress proxy load balancer? FTP is a stateful protocol, so using a load balancer can cause the sync to fail (non-deterministically) if sequential requests don't originate from the same IP.
SSL and FTPS¶
-
Does your server use a self-signed certificate? Have you added it to the source truststore? See the SSL and hostname validation section above.
-
Does your FTP server only support legacy TLS versions (for example, TLS 1.1)? If so, the connector runtime might not accept any of the Cipher suites offered by the server. File an issue to explore alternatives with a Palantir representative.
中文翻译¶
FTP/FTPS¶
将 Foundry 连接到 FTP 和 FTPS 服务器,以便在文件夹和 Foundry 数据集之间同步数据。
支持的功能¶
| 功能 | 状态 |
|---|---|
| 探索 | 🟢 正式发布 |
| 批量导入 | 🟢 正式发布 |
| 增量同步 | 🟢 正式发布 |
数据模型¶
该连接器可以将任何类型的文件传输到 Foundry 数据集中。文件格式会被保留,在传输过程中或传输后不会应用任何模式。您可以对输出数据集应用任何必要的模式,或编写下游转换来访问数据。
性能与限制¶
可传输的文件大小没有限制。然而,网络问题可能导致大规模传输失败。特别是,在 Foundry 工作节点上运行超过两天的同步任务将会被中断。为避免网络问题,我们建议使用较小的文件大小,并限制每次同步执行时摄取的文件数量。同步可以安排为频繁运行。
设置¶
- 打开数据连接应用程序,并在屏幕右上角选择 + 新建源。
- 从可用的连接器类型中选择 FTP/FTPS。
- 按照后续配置提示,使用以下各节中的信息完成连接器的设置。
:::callout{theme="neutral"} 要访问本地 FTP 服务器,我们建议您使用代理策略连接代理。 :::
了解更多关于在 Foundry 中设置连接器的信息。
身份验证¶
FTP/FTPS 身份验证使用用户名和密码完成。如果 FTP 用户没有对连接配置的根目录的权限,FTP/FTPS 连接可能会失败。有关根目录的更多详细信息,请参阅配置选项。
| 选项 | 是否必需 | 描述 |
|---|---|---|
用户名 |
是 | FTP 登录用户名。 |
密码 |
是 | FTP 登录密码。对于匿名登录,此字段可以留空。请联系您的服务器管理员以获取更多信息。 |
网络¶
如果您的 FTP/FTPS 连接在 Foundry 中运行,您必须添加一个网络出口策略以将该连接加入白名单。
如果通过域名连接,则应为 FTP 服务器域名的控制端口(通常为端口 21)和数据端口创建出口策略。我们建议创建两个网络策略:一个用于控制端口的单端口策略,以及一个用于数据端口的端口范围策略。数据端口由 FTP 服务器的管理员确定。如果尽管配置了正确的出口策略,错误仍然发生,请提交问题并引用所应用的策略列表。
:::callout{theme="warning"}
如果服务器的域名解析到多个域和/或服务器,则所有相关的域及其关联的 IP 都需要被列入白名单。要验证服务器是否解析到多个域和/或服务器,请在终端中对您尝试连接的服务器运行命令 dig <domain>,并查看答案部分。
:::
证书与私钥¶
使用以下指南配置额外的客户端或服务器证书和私钥,以正确设置您的连接器。
SSL 与主机名验证¶
SSL 连接会验证服务器证书。通常,SSL 验证通过证书链进行;默认情况下,代理和 Foundry 工作节点都信任大多数行业标准的证书链。如果您要连接的服务器具有自签名证书,或者防火墙对连接执行 TLS 拦截,则连接器必须信任该证书。了解更多关于在数据连接中使用证书的信息。
:::callout{theme="warning"}
服务器必须提供完整的证书链才能使 SSL 验证正常工作。可以通过运行命令 openssl s_client -connect {hostname}:{port} -showcerts -starttls ftp 获取 FTP 服务器的证书链。要验证证书链,请使用 OpenSSL 命令行工具或任何其他可用工具。
:::
:::callout{theme="neutral"} 如果使用 FTPS,请确保 FTPS 服务器的证书已添加到代理的信任库中。 :::
:::callout{theme="neutral"} Foundry 会尝试对所有出口路由进行验证。但是,FTP 无法被检查,这可能导致连接挂起和/或超时错误。如果尽管配置了正确的出口策略,错误仍然发生,请报告问题,并附上您希望禁用主机名验证的策略列表。 :::
隐式/显式 SSL¶
FTP 服务器可以配置为支持显式或隐式 SSL。运行在端口 990 上的服务器通常使用隐式 SSL。
请与您的服务器管理员确认服务器的设置。默认情况下,连接器假定使用显式 SSL;您可能需要根据您的环境更改此设置。
连接要求¶
FTP 需要 控制 和 数据 连接类型。数据 连接必须配置为 主动 或 被动 模式。
- 控制: 客户端到服务器
- 数据: 从范围中选择的数据(例如,1024–1123)
- 被动: 客户端到服务器
- 主动: 服务器到客户端
- 仅适用于代理连接
默认 FTP/FTPS 连接器端口:
- FTP:21
- FTPS 显式:21
- FTPS 隐式:990
主动与被动模式¶
我们建议使用被动模式网络连接。在被动模式下,所有连接都由客户端发起。使用被动模式时,请确保控制端口(通常为 21)和数据传输的端口范围(例如 1024–1123)已被列入白名单。请联系您的 FTP/FTPS 服务器管理员以获取连接详细信息。
主动模式是一种较旧的文件传输建立方法。在主动模式下,客户端连接到服务器,同时服务器连接到客户端。服务器和客户端相互依赖,需要双向网络连接。这种网络方法在大多数安全环境中通常难以实现,并且仅在使用代理连接时才有可能。
配置选项¶
| 选项 | 是否必需 | 默认值 | 描述 |
|---|---|---|---|
URL |
是 | FTP/FTPS 服务器的 URL。URL 可以可选地包含服务器上目录的路径,该目录将用作连接的根目录(例如,ftp://server.name/folder/name)。 |
|
配置客户端证书和私钥 |
否 | 有关更多信息,请参阅证书与私钥。 | |
配置服务器证书 |
否 | 有关更多信息,请参阅证书与私钥。 | |
连接超时 |
否 | 30 秒 | 以毫秒为单位增加超时时间。 |
重新登录时间 |
否 | 15 分钟 | 以分钟为单位修改间隔。 |
文件更改超时 |
否 | 2 秒 | 设置文件在被考虑上传之前必须保持不变的时长。以毫秒为单位的超时时间。 |
HTTP 代理 URL |
否 | 代理服务器的 URL,以 http:// 或 https:// 开头。对 HTTP 代理的支持高度依赖于所使用的 FTP 服务器,并且不能在 主动 模式下使用。这是因为 HTTP 代理不支持客户端请求监听外部可访问的端口。主动 模式传输涉及 FTP 服务器回连到客户端,而这无法通过 HTTP 代理实现。 |
|
SSL 方法 |
否 | 显式 | 是否为 FTPS 连接使用显式或隐式 SSL。 |
模式 |
否 | 被动 |
被动 或 主动 |
时区 |
否 | 连接器的时区 | FTP 服务器的时区。FTP 记录的时间戳不带时区。要查看准确的修改时间戳,如果 FTP 服务器时区与默认时区不同,请指定它。 |
时间戳格式字符串 |
否 | MM-dd-yy hh:mma |
用于解析 FTP 服务器时间戳的格式字符串。时间戳用于确定自上次同步以来被修改的文件。请参阅 Java 文档 ↗ 了解支持的格式。 |
控制编码 |
否 | US-ASCII | FTP 控制消息的编码。如果文件名使用的编码与数据连接服务器默认文件系统编码不同,则可能需要控制编码。 示例: 在 Windows FTP 服务器上, windows-31j 常用于日语,x-windows-949 常用于韩语。有关更多信息,请参阅 Java 文档 ↗。 |
保持连接 |
否 | false |
选择是否在下载大文件时发送 FTP NOOP 命令以保持控制连接活跃。并非所有 FTP 服务器都支持此功能。 |
从 FTP 同步数据¶
FTP 连接器使用基于文件的同步接口。
故障排除¶
代理连接¶
-
您在设置代理连接时遇到问题吗?安装一个 FTP/S 客户端,并尝试使用与源相同的配置连接到服务器。如果此连接失败,则问题不是连接器错误。在提交问题之前,请调查网络连接、身份验证和 FTP 服务器配置。
-
您是否在使用出口代理负载均衡器?FTP 是一种有状态协议,因此如果顺序请求不是来自同一个 IP,使用负载均衡器可能会导致同步(非确定性地)失败。
SSL 和 FTPS¶
-
您的服务器是否使用自签名证书?您是否已将其添加到源信任库?请参阅上面的 SSL 和主机名验证 部分。
-
您的 FTP 服务器是否仅支持旧版 TLS 版本(例如,TLS 1.1)?如果是这样,连接器运行时可能不接受服务器提供的任何密码套件。请提交问题以与 Palantir 代表探讨替代方案。