Security considerations for local development(本地开发的安全注意事项)¶
The best way to ensure no data leaves the platform during transforms preview is to execute it within VS Code workspaces. With this approach, no data is sent directly to your machine.
Local development may be preferred over VS Code workspaces when using certain third-party tools, libraries, and extensions. Transforms preview is possible when using the Palantir extension for VS Code in local development. However, local preview might only be enabled for a subset of users due to the data security implications of streaming parts of datasets into the system memory of a user's machine to locally execute the previewed pipeline's logic.
Requirements for executing local preview¶
Users with Download operations on a dataset are allowed to export that dataset through the Foundry API. It is best practice to grant the Download operation with caution. Review the relevant section of download controls for details. Running a transforms preview locally requires the Download operation on all input datasets (and outputs in the case of incremental transforms).
Additionally, a platform administrator must enable local preview in the Code Repositories settings page in Control Panel. Local preview is only possible when both conditions are satisfied. This ensures local preview cannot be used to elevate any users' permissions on datasets.
Data lifecycle during local preview¶
During preview, no input datasets are stored on disk. Preview Engine sets up an S3 connection to Foundry using Foundry's S3 compatible API. This implementation results in each local preview producing events tracked in audit logs for full visibility.
PySpark (for legacy transforms) or Polars (for lightweight transforms) then streams the required parts of the datasets into memory. The subset of the data streamed is determined by the code-defined input filters you chose and the previewed pipeline's logic. All caching during preview execution resides in system memory.
At the end of a preview invocation, at most 10000 rows of the resulting output datasets written by the transform are stored in a cache folder to enable displaying them in VS Code's Preview table. You can set the location of this cache folder in the Palantir extension's settings under the Palantir > Cache: Path key.

The result caches are kept for as little time as necessary while maintaining the Palantir extension's features. The result caches are deleted:
- At the beginning of each new preview run.
- When VS Code is exited gracefully, such as when the window's Close button is selected.
- On each startup of VS Code.
Additionally, you can manually delete the caches by calling the Delete all locally stored Python transforms preview results command in the command palette. The command can be assigned to a keybinding through the palantir.transforms.datasets.cleanUp identifier as well.

Risk assessment¶
The data residency features of secure streaming local preview described above aim to prevent accidental and unintentional data exfiltration. These features are not meant to, and cannot prevent intentional abuse.
For example, there is no mechanism for stopping users from deliberately writing data to any folder on their machines during preview execution. Users who have the Download operation on a dataset are expected to exercise reasonable judgment when handling said datasets both when using Foundry's public data access endpoints and when executing local preview.
For technical setup instructions for local preview, review our documentation on previewing transforms in local development.
中文翻译¶
本地开发的安全注意事项¶
在转换预览过程中确保数据不离开平台的最佳方式是在 VS Code 工作区 中执行。采用此方法时,没有任何数据会直接发送到您的本地机器。
当使用某些第三方工具、库和扩展时,本地开发可能比 VS Code 工作区更受青睐。在本地开发中使用 Palantir VS Code 扩展时,可以进行转换预览。然而,由于将部分数据集流式传输到用户机器的系统内存中以本地执行预览管道的逻辑会带来数据安全风险,本地预览可能仅对部分用户启用。
执行本地预览的要求¶
拥有数据集 Download 操作权限的用户可以通过 Foundry API 导出该数据集。最佳实践是谨慎授予 Download 操作权限。有关详细信息,请参阅下载控制的相关章节。在本地运行转换预览需要对所有输入数据集(以及增量转换情况下的输出数据集)拥有 Download 操作权限。
此外,平台管理员必须在控制面板的代码仓库设置页面中启用本地预览。只有同时满足这两个条件时,才能进行本地预览。这确保了本地预览不能用于提升任何用户对数据集的权限。
本地预览期间的数据生命周期¶
在预览期间,输入数据集不会存储在磁盘上。预览引擎使用 Foundry 的 S3 兼容 API 建立与 Foundry 的 S3 连接。这种实现方式使得每次本地预览都会产生可在审计日志中追踪的事件,从而实现完全可见性。
然后,PySpark(用于传统转换)或 Polars(用于轻量级转换)将所需的数据集部分流式传输到内存中。流式传输的数据子集由您选择的代码定义的输入过滤器以及预览管道的逻辑决定。预览执行期间的所有缓存都驻留在系统内存中。
在预览调用结束时,转换写入的结果输出数据集中最多 10000 行会存储在缓存文件夹中,以便在 VS Code 的预览表中显示。您可以在 Palantir 扩展设置的 Palantir > Cache: Path 键下设置此缓存文件夹的位置。

结果缓存的保留时间仅需满足 Palantir 扩展功能所需的最短时间。结果缓存会在以下情况下被删除:
- 每次新的预览运行开始时。
- 当 VS Code 正常退出时,例如选择窗口的关闭按钮。
- 每次 VS Code 启动时。
此外,您可以通过在命令面板中调用 Delete all locally stored Python transforms preview results 命令来手动删除缓存。该命令也可以通过 palantir.transforms.datasets.cleanUp 标识符分配快捷键。

风险评估¶
上述安全流式传输本地预览的数据驻留功能旨在防止意外和非故意的数据泄露。这些功能并非旨在也无法防止故意滥用。
例如,没有机制可以阻止用户在预览执行期间故意将数据写入其机器上的任何文件夹。对数据集拥有 Download 操作权限的用户,在使用 Foundry 公共数据访问端点以及执行本地预览时,应合理判断并谨慎处理这些数据集。
有关本地预览的技术设置说明,请参阅我们关于在本地开发中预览转换的文档。