跳转至

Comparison: Code Repositories vs. Code Workspaces vs. Code Workbook(对比:代码仓库(Code Repositories)、代码工作区(Code Workspaces)与代码工作簿(Code Workbook))

Foundry has three products available for writing code-based data transformations: Code Workbook, Code Workspaces, and Code Repositories. While there is some feature overlap between these products, each is geared toward distinct workflows and user types. The guide below is intended to help you determine which tool is best suited to your needs.

Code Repositories is recommended for creating robust production pipelines and supporting workflows that require an additional layer of governance and scrutiny. With Code Repositories, data engineers can create efficient pipelines in bulk. Example workflows that are a good fit for Code Repositories include:

  • A daily pipeline at high data scale which requires incremental compute.
  • A high-visibility pipeline with strict governance requirements to be able to revert to previous versions of historical code, or gate code changes on unit tests passing.

Code Workspaces is recommended for quick and efficient exploratory analyses using JupyterLab® and RStudio® Workbench to combine familiar IDEs with the benefits of the Foundry platform, such as data security, branching, build scheduling, and resource management. Example workflows that are a good fit for Code Workspaces include:

  • Running a cell-by-cell data analysis and exporting its contents to a shareable report
  • Prototyping a data transformation pipeline or a machine learning model

Code Workbook is recommended for performing code-based analyses on high-scale data that would not otherwise be suitable for Code Workspaces. These analyses can be for one-time use or could produce an artifact that is updated on a recurring basis. Code Workbook can also be used to prototype pipelines, which can then be promoted to repositories. Example workflows that are a good fit for Code Workbook include:

  • Investigating the results of a clinical trial by testing out different p-values.
  • Creating interactive visualizations to share with others.

In addition to the code-based products above, Pipeline Builder is the Palantir platform's primary no-code application for building and maintaining production data pipelines. Pipeline Builder uses a graph and form-based interface, enabling users to integrate data and create business logic transformations without writing code. If you are evaluating whether to use Pipeline Builder or Code Repositories for your pipeline, see Considerations: Pipeline Builder and Code Repositories.

Comparison summary

Code Repositories Code Workspaces Code Workbook
Features Advanced pipelines Exploratory analysis Advanced analysis
Enables complex workflows in long-lasting data pipelines with flexibility in performance optimization and code generation. Enables interactive exploratory workflows using familiar IDEs tied with Foundry primitives. Enables data analysis workflows with support for common analytical languages and visualization libraries.
Languages supported Python, SQL, Java, Mesa Python, R Python, R, SQL
Environments Supported All environments Kubernetes environments only All environments
Batch Pipeline support Yes Yes Yes
Incremental computation Yes No No
Transform generation Yes No No
Multi-output transforms Yes Yes No
Filesystem access Yes Yes Yes
Visualization support No Yes Yes
Iteration cycle Iterate on code logic Iterate on data discovery and analysis Iterate on insight generation
Designed to help iterate on code logic. Runtime debugger and previews can assist in validating transform logic. Data can be analyzed in Foundry after building. Designed to help rapidly iterate on data discovery and analysis using widely known tools that seamlessly integrate with the rest of Foundry. Designed to help generate insights from data; all transforms run on the full input data, interactive console enables ad-hoc queries, and Spark execution model is optimized for quick iteration.
Full data preview Preview data sample, with the ability to pre-filter the input sample Full data preview Full data preview
Debugger Yes No No
Console support In debug mode Yes Yes
Spark module management Spark modules initiated at the job level Spark-less environment for fast feedback loop Spark modules kept warm for immediate interactivity, and initiated at the workbook level
Operations Data pipeline management Data exploration management Data analysis management
Supports Foundry data management libraries and publishing custom Python libraries Fully adjustable environment that can consume pip, CRAN, and conda libraries, including those published from Code Repositories Can consume custom libraries published from Code Repositories; users can save pieces of logic as code templates, enabling point-and-click analysis by other users.
Data Expectations Yes No No
Publish custom libraries Yes No No
Consume custom libraries Yes Yes Yes, for some environments
Point-and-click code templates No No Yes
Change management Governance Flexibility Rapid changes
Prioritizes change traceability and governance to ensure that critical pipelines remain secure and robust; advanced review and approval workflows and complete changelogs. Prioritizes rapid and flexible iteration with full branching support and automatic Git versioning. Prioritizes rapid iteration and collaboration with a lightweight branching workflow; does not require CI checks or unit testing.
Full Git workflow Yes Yes No
Copy data after merge No No Yes
Administer and remove security markings Yes No No
Impact analysis views Yes No No
Advanced code review workflows Yes No No
Unit testing Yes No No
Table summary ##### Code Repositories features * Code Repositories features advanced pipelines and enables complex workflows in long-lasting data pipelines with flexibility in performance optimization and code generation. * Languages supported in Code Repositories include Python, SQL, Java, and Mesa. * Code Repositories supports [incremental computation](https://palantir.com/docs/foundry/transforms-python-spark/incremental-examples/), [transform generation](https://palantir.com/docs/foundry/transforms-python/pipelines/#automatic-registration), [multi-output transforms](https://palantir.com/docs/foundry/transforms-python/transforms/#define-transforms), and [filesystem access](https://palantir.com/docs/foundry/transforms-python/unstructured-files/). * Code Repositories does not support visualizations. ##### Code Workspaces features * Code Workspaces features quick and efficient exploratory workflows with an embedded support for JupyterLab® and RStudio® Workbench in Foundry. * Languages supported in Code Workspaces include Python and R. * Code Workspaces supports [filesystem access](https://palantir.com/docs/foundry/code-workspaces/data/#non-tabular-datasets) and provides full flexibility on notebook-based analyses. * Code Workspaces does not support distributed Spark, and is therefore better suited for data that can fit within the workspace's [compute limits](https://palantir.com/docs/foundry/code-workspaces/compute-usage/#understanding-drivers-of-foundry-compute-usage-in-code-workspaces). ##### Code Workbook features * Code Workbook features advanced analysis analysis workflows with support for common analytical languages and visualization libraries. * Languages supported in Code Workbook include Python, R, and SQL. * Code Workbook supports [filesystem access](https://palantir.com/docs/foundry/code-workbook/transforms-unstructured/) and [visualization](https://palantir.com/docs/foundry/code-workbook/transforms-visualize/). * Code Workbook does not support incremental computation, transform generation, or multi-output transforms. ##### Code Repositories iteration cycle * Code Repositories is designed to help iterate on code logic. Data can be analyzed in Foundry after building. * Code Repositories supports data sample previews to validate transform logic, with the ability to pre-filter the input sample. * Code Repositories supports [debugging at runtime](https://palantir.com/docs/foundry/code-repositories/debug-transforms/). * In Code Repositories, Spark modules are initiated at the job level. ##### Code Workspaces iteration cycle * Code Workspaces is designed to help explore and analyze data. Results can then be shared, published to dashboards, turned into re-usable transforms, or exported to production-ready pipeline tools such as Code Repositories or Pipeline Builder. * Code Workspaces offers the full flexibility of the JupyterLab® and RStudio® Workbench IDEs, including full code and data previews. * Code Workspaces provides cell-by-cell iteration for instant feedback on code execution. * In Code Workspaces, no Spark modules are required and a fully customizable kernel is available for ad-hoc adjustments of the environment. ##### Code Workbook iteration cycle * Code Workbook is designed to help generate insights from data. All transforms run on the full input data, and Spark execution models are optimized for quick iteration. * Code Workbook supports full data previews. * Code Workbook provides [console support](https://palantir.com/docs/foundry/code-workbook/workbooks-console/) for ad-hoc analysis of transforms. * In Code Workbook, Spark modules are kept warm for immediate interactivity and initiated at the workbook level. ##### Code Repositories operations * Code Repositories supports Foundry data management libraries and custom Python libraries. * Code Repositories supports [data expectations](https://palantir.com/docs/foundry/transforms-python/data-expectations-getting-started/), [publishing](https://palantir.com/docs/foundry/transforms-python/share-python-libraries/) custom libraries, and consuming custom libraries. * Code Repositories does not support point-and-click code templates. ##### Code Workspaces operations * Code Workspaces can consume pip, CRAN, and conda libraries, including those published from Code Repositories, and environments can be modified quickly. * Code Workspaces does not support data expectations or publishing custom libraries. * Code Workspaces does not support point-and-click code templates. ##### Code Workbook operations * Code Workbook can consume custom libraries published from Code Repositories, and users can save pieces of logic as code templates, enabling point-and-click analysis by other users. * Code Workbook does not support data expectations or publishing custom libraries. * Code Workbook does [consume custom libraries](https://palantir.com/docs/foundry/code-workbook/environment-overview/) for some Spark environments. * Code Workbook supports [point-and-click templates](https://palantir.com/docs/foundry/code-workbook/templates-overview/). ##### Code Repositories change management * Code Repositories prioritizes change traceability and governance to ensure that critical pipelines remain secure and robust. * Code Repositories provides complete changelogs. * Code Repositories provides a [full Git workflow](https://palantir.com/docs/foundry/building-pipelines/branching-release-process/), security marking administration and [removal](https://palantir.com/docs/foundry/building-pipelines/remove-inherited-markings/), [impact analysis](https://palantir.com/docs/foundry/code-repositories/analyze-impact/) views, [advanced code review](https://palantir.com/docs/foundry/code-repositories/branch-settings/#protected-branches) workflows, and [unit testing](https://palantir.com/docs/foundry/code-repositories/unit-tests/). * Code Repositories does not support copying data after merging. ##### Code Workspaces change management * Code Workspaces prioritizes rapid and flexible iteration with full branching support and automatic Git versioning. * Code Workspaces are fully backed by Code Repositories and benefit from their [full Git workflow](https://palantir.com/docs/foundry/building-pipelines/branching-release-process/). * Code Workspaces does not support copying data after merging. * Code Workspaces stores safe checkpoints of its notebook's contents for 30 days, allowing users to safely retain and retrieve any given state, while also providing the opportunity to permanently store backups of the code in the Git repository. ##### Code Workbook change management * Code Workbook prioritizes rapid iteration and collaboration with a lightweight branching workflow. Code Workbook does not require CI checks or unit testing. * Code Workbook supports copying data after merging. * Code Workbook does not provide a full Git workflow, security marking administration or removal, impact analysis views, advanced code review workflows, or unit testing. *** JupyterLab® is a registered trademark of NumFOCUS. RStudio® is a trademark of Posit™.

中文翻译


对比:代码仓库(Code Repositories)、代码工作区(Code Workspaces)与代码工作簿(Code Workbook)

Foundry提供三款可用于编写代码化数据转换的产品:Code Workbook、Code Workspaces与Code Repositories。尽管三款产品存在部分功能重叠,但各自面向不同的工作流与用户群体。以下指南将帮助你判断哪款工具最适配你的需求。

代码仓库(Code Repositories)推荐用于构建稳健的生产流水线,支持需要额外治理与审查层级的工作流。借助代码仓库,数据工程师可批量创建高效流水线。适合使用代码仓库的工作流示例包括: * 需增量计算(incremental computation)的超大规模每日流水线 * 有严格治理要求的高关注度流水线,支持回滚到历史代码版本,或要求单元测试(unit testing)通过后方可合并代码变更

代码工作区(Code Workspaces)推荐用于快速高效的探索性分析,它将用户熟悉的集成开发环境(IDE)与Foundry平台的数安保障、分支管理、构建调度、资源管理等优势相结合,原生支持JupyterLab®与RStudio® Workbench。适合使用代码工作区的工作流示例包括: * 逐单元运行数据分析,并将结果导出为可共享的报告 * 为数据转换流水线或机器学习模型制作原型

代码工作簿(Code Workbook)推荐用于对超大规模数据开展代码化分析,这类场景通常不适用于代码工作区。这类分析可以是一次性需求,也可以生成定期更新的产出物。代码工作簿也可用于制作流水线原型,后续可升级迁移到代码仓库。适合使用代码工作簿的工作流示例包括: * 通过测试不同p值分析临床试验结果 * 创建可共享给他人的交互式可视化图表

除了上述代码类产品外,流水线构建器(Pipeline Builder)是Palantir平台的核心无代码应用,用于构建和维护生产级数据流水线。流水线构建器采用图形化+表单的交互界面,用户无需编写代码即可完成数据集成、配置业务逻辑转换。如果你正在评估搭建流水线应选用流水线构建器还是代码仓库,请参考考量因素:流水线构建器与代码仓库

对比汇总

Code Repositories Code Workspaces Code Workbook
功能 高级流水线能力 探索性分析能力 高级分析能力
支持长生命周期数据流水线的复杂工作流,可灵活开展性能优化与转换生成(transform generation)。 支持使用用户熟悉的IDE开展交互式探索工作流,与Foundry原生能力深度打通。 支持数据分析工作流,兼容常用分析语言与可视化库。
支持语言 Python, SQL, Java, Mesa Python, R Python, R, SQL
支持环境 全环境支持 仅支持Kubernetes环境 全环境支持
批处理流水线(Batch Pipeline)支持
增量计算
转换生成
多输出转换(multi-output transforms)
文件系统访问(filesystem access)
可视化支持
迭代周期 代码逻辑迭代 数据探索与分析迭代 洞察生成迭代
专为迭代代码逻辑设计,运行时调试器(debugger)与预览功能可帮助验证转换逻辑,构建完成后可在Foundry中分析数据。 专为快速迭代数据探索与分析设计,使用广泛普及的工具,可与Foundry其余能力无缝集成。 专为从数据中生成洞察设计,所有转换都基于全量输入数据运行,交互式控制台支持即席查询,Spark执行模型针对快速迭代做了优化。
全量数据预览 支持数据样本预览,可对输入样本进行预过滤 全量数据预览 全量数据预览
调试器
控制台支持 仅调试模式下支持
Spark模块管理 Spark模块在作业级别启动 无Spark环境,可实现快速反馈循环 Spark模块保持预热状态以实现即时交互,在工作簿级别启动
运维管理 数据流水线管理 数据探索管理 数据分析管理
支持Foundry数据管理库,可发布自定义Python库。 环境完全可自定义,可安装pip、CRAN、conda库,包括从代码仓库发布的库。 可使用从代码仓库发布的自定义库;用户可将逻辑片段保存为代码模板,其他用户可通过点选方式开展分析。
数据期望(Data Expectations)
发布自定义库
使用自定义库 部分环境支持
点选式代码模板
变更管理 治理优先 灵活优先 快速变更优先
优先保障变更可追溯与合规治理,确保关键流水线安全稳健;提供高级审核与审批工作流,以及完整的变更日志。 优先保障快速灵活的迭代,支持完整分支能力与自动Git版本控制。 优先保障快速迭代与协作,采用轻量级分支工作流,无需持续集成(CI)检查或单元测试。
完整Git工作流
合并后复制数据
管理与移除安全标记
影响分析视图
高级代码评审工作流
单元测试
表格摘要 ##### 代码仓库功能 * 代码仓库具备高级流水线能力,支持长生命周期数据流水线的复杂工作流,可灵活开展性能优化与转换生成。 * 代码仓库支持的语言包括Python、SQL、Java与Mesa。 * 代码仓库支持[增量计算](https://palantir.com/docs/foundry/transforms-python-spark/incremental-examples/)、[转换生成](https://palantir.com/docs/foundry/transforms-python/pipelines/#automatic-registration)、[多输出转换](https://palantir.com/docs/foundry/transforms-python/transforms/#define-transforms)与[文件系统访问](https://palantir.com/docs/foundry/transforms-python/unstructured-files/)。 * 代码仓库不支持可视化能力。 ##### 代码工作区功能 * 代码工作区具备快速高效的探索性工作流能力,在Foundry中原生嵌入支持JupyterLab®与RStudio® Workbench。 * 代码工作区支持的语言包括Python与R。 * 代码工作区支持[文件系统访问](https://palantir.com/docs/foundry/code-workspaces/data/#non-tabular-datasets),可灵活开展基于笔记本的分析。 * 代码工作区不支持分布式Spark,因此更适配可在工作区[计算限额](https://palantir.com/docs/foundry/code-workspaces/compute-usage/#understanding-drivers-of-foundry-compute-usage-in-code-workspaces)内处理的数据集。 ##### 代码工作簿功能 * 代码工作簿具备高级分析工作流能力,支持常用分析语言与可视化库。 * 代码工作簿支持的语言包括Python、R与SQL。 * 代码工作簿支持[文件系统访问](https://palantir.com/docs/foundry/code-workbook/transforms-unstructured/)与[可视化能力](https://palantir.com/docs/foundry/code-workbook/transforms-visualize/)。 * 代码工作簿不支持增量计算、转换生成与多输出转换。 ##### 代码仓库迭代周期 * 代码仓库专为迭代代码逻辑设计,构建完成后可在Foundry中分析数据。 * 代码仓库支持数据样本预览以验证转换逻辑,可对输入样本进行预过滤。 * 代码仓库支持[运行时调试](https://palantir.com/docs/foundry/code-repositories/debug-transforms/)。 * 代码仓库中,Spark模块在作业级别启动。 ##### 代码工作区迭代周期 * 代码工作区专为探索与分析数据设计,分析结果可共享、发布到看板、转化为可复用转换,或导出到代码仓库、流水线构建器等生产级流水线工具。 * 代码工作区提供JupyterLab®与RStudio® Workbench IDE的完整灵活性,支持全量代码与数据预览。 * 代码工作区支持逐单元迭代,代码执行后可即时获得反馈。 * 代码工作区无需Spark模块,提供完全可自定义的内核,支持即席调整环境配置。 ##### 代码工作簿迭代周期 * 代码工作簿专为从数据生成洞察设计,所有转换都基于全量输入数据运行,Spark执行模型针对快速迭代做了优化。 * 代码工作簿支持全量数据预览。 * 代码工作簿为转换的即席分析提供[控制台支持](https://palantir.com/docs/foundry/code-workbook/workbooks-console/)。 * 代码工作簿中,Spark模块保持预热状态以实现即时交互,在工作簿级别启动。 ##### 代码仓库运维管理 * 代码仓库支持Foundry数据管理库与自定义Python库。 * 代码仓库支持[数据期望](https://palantir.com/docs/foundry/transforms-python/data-expectations-getting-started/)、[发布](https://palantir.com/docs/foundry/transforms-python/share-python-libraries/)自定义库与使用自定义库。 * 代码仓库不支持点选式代码模板。 ##### 代码工作区运维管理 * 代码工作区可使用pip、CRAN、conda库,包括从代码仓库发布的库,环境可快速调整。 * 代码工作区不支持数据期望与发布自定义库。 * 代码工作区不支持点选式代码模板。 ##### 代码工作簿运维管理 * 代码工作簿可使用从代码仓库发布的自定义库,用户可将逻辑片段保存为代码模板,支持其他用户通过点选开展分析。 * 代码工作簿不支持数据期望与发布自定义库。 * 部分Spark环境下的代码工作簿可[使用自定义库](https://palantir.com/docs/foundry/code-workbook/environment-overview/)。 * 代码工作簿支持[点选式模板](https://palantir.com/docs/foundry/code-workbook/templates-overview/)。 ##### 代码仓库变更管理 * 代码仓库优先保障变更可追溯与合规治理,确保关键流水线安全稳健。 * 代码仓库提供完整的变更日志。 * 代码仓库提供[完整Git工作流](https://palantir.com/docs/foundry/building-pipelines/branching-release-process/)、安全标记管理与[移除](https://palantir.com/docs/foundry/building-pipelines/remove-inherited-markings/)、[影响分析](https://palantir.com/docs/foundry/code-repositories/analyze-impact/)视图、[高级代码评审](https://palantir.com/docs/foundry/code-repositories/branch-settings/#protected-branches)工作流与[单元测试](https://palantir.com/docs/foundry/code-repositories/unit-tests/)能力。 * 代码仓库不支持合并后复制数据。 ##### 代码工作区变更管理 * 代码工作区优先保障快速灵活的迭代,支持完整分支能力与自动Git版本控制。 * 代码工作区完全基于代码仓库底层能力构建,可复用其[完整Git工作流](https://palantir.com/docs/foundry/building-pipelines/branching-release-process/)。 * 代码工作区不支持合并后复制数据。 * 代码工作区将笔记本内容的安全检查点保留30天,用户可安全留存与找回任意历史状态,同时支持将代码备份永久存储到Git仓库中。 ##### 代码工作簿变更管理 * 代码工作簿优先保障快速迭代与协作,采用轻量级分支工作流,无需CI检查或单元测试。 * 代码工作簿支持合并后复制数据。 * 代码工作簿不提供完整Git工作流、安全标记管理与移除、影响分析视图、高级代码评审工作流与单元测试能力。 *** JupyterLab®是NumFOCUS的注册商标。 RStudio®是Posit™的商标。