跳转至

Hidden code repositories(隐藏代码仓库)

Code Workbook offers lightweight version control through the use of branches. Additionally, every workbook is backed by a special hidden code repository. This repository serves as a secure backup of the code written in a code workbook while also exposing the history of all code changes made on the workbook.

You can access the hidden code repository of a workbook by opening the gear icon menu at the top right of the Code Workbook interface and selecting Open hidden Code Repository.

Hidden repository button

Special properties

A workbook's hidden repository will always have the following properties:

  • The repository is read-only: Repository contents can only be viewed and never updated directly in the repository. The only way to update the repository is to make code changes in the workbook.
  • The repository is hidden by default: It will only be discoverable through the Code Workbook interface.
  • The repository stores three separate files: The pipeline.R, pipeline.py, and pipeline.sql files each contain all of the code of the converted workbook for their respective language.
  • The repository saves a full change history: A full history of the workbook's code changes for each branch is available under the Branches tab of the hidden repository.
  • The repository contains a hidden workbook.yml file: The file stores basic metadata about the workbook.

Every code change made on a workbook branch automatically creates a new commit to the corresponding branch in the hidden code repository.

Code conversion

Code Workbook code, when committed to the hidden repository, is automatically converted to Code Repository syntax. For example, consider the following code cell in Code Workbook:

def rename_column(dataset):
    return dataset.withColumnRenamed("old_name", "new_name")

The code will be converted to the following in the pipeline.py file of the hidden repository:

@transform_pandas(
    Output(rid="ri.foundry.main.dataset.id-1"),
    dataset=Input(rid="ri.vector.main.dataset.id-2")
)
def rename_column(dataset):
    return dataset.withColumnRenamed("old_name", "new_name")

If more than one code cell of a given language is present in a code workbook, each of the code cells will be appended in the same, unique file: pipeline.R for R, pipeline.py for python, and pipeline.sql for SQL. This allows you to view all of the code of a given language in a single file. Code written in the Global code section of the workbook will also be stored in the appropriate files.

Recover lost code

By regularly storing backups of a workbook, hidden code repositories are the recommended way to restore code that was lost or accidentally removed. To consult the history of a given workbook's branch, open the Branches tab at the top of the hidden repository, select the desired branch (such as master), and select the commit for which you would like to inspect code changes. You can then copy the code and paste it back into the workbook.


中文翻译


隐藏代码仓库

Code Workbook 通过使用分支提供轻量级版本控制。此外,每个 workbook 都由一个特殊的隐藏代码仓库(hidden code repository)作为支撑。该仓库不仅为 code workbook 中编写的代码提供安全备份,还记录了 workbook 上所有代码变更的历史记录。

您可以通过点击 Code Workbook 界面右上角的齿轮图标菜单,并选择 打开隐藏代码仓库(Open hidden Code Repository) 来访问 workbook 的隐藏代码仓库。

隐藏仓库按钮

特殊属性

workbook 的隐藏仓库始终具有以下属性:

  • 仓库为只读: 仓库内容只能查看,不能直接在仓库中更新。更新仓库的唯一方式是在 workbook 中进行代码更改。
  • 仓库默认隐藏: 只能通过 Code Workbook 界面发现该仓库。
  • 仓库存储三个独立文件: pipeline.Rpipeline.pypipeline.sql 文件分别包含转换后 workbook 对应语言的全部代码。
  • 仓库保存完整的变更历史: 在隐藏仓库的 分支(Branches) 选项卡下,可以查看 workbook 每个分支的完整代码变更历史。
  • 仓库包含一个隐藏的 workbook.yml 文件: 该文件存储了 workbook 的基本元数据。

在 workbook 分支上进行的每次代码更改,都会自动在隐藏代码仓库的对应分支上创建一个新的提交(commit)。

代码转换

Code Workbook 中的代码在提交到隐藏仓库时,会自动转换为 Code Repository 语法。例如,考虑 Code Workbook 中的以下代码单元格:

def rename_column(dataset):
    return dataset.withColumnRenamed("old_name", "new_name")

该代码在隐藏仓库的 pipeline.py 文件中将被转换为以下内容:

@transform_pandas(
    Output(rid="ri.foundry.main.dataset.id-1"),
    dataset=Input(rid="ri.vector.main.dataset.id-2")
)
def rename_column(dataset):
    return dataset.withColumnRenamed("old_name", "new_name")

如果 code workbook 中存在多个同一语言的代码单元格,每个代码单元格将追加到同一个唯一的文件中:R 语言对应 pipeline.R,Python 对应 pipeline.py,SQL 对应 pipeline.sql。这样您可以在单个文件中查看某一语言的所有代码。在 workbook 的 全局代码(Global code) 部分编写的代码也会存储在相应的文件中。

恢复丢失的代码

通过定期存储 workbook 的备份,隐藏代码仓库是恢复丢失或意外删除代码的推荐方式。要查看某个 workbook 分支的历史记录,请打开隐藏仓库顶部的 分支(Branches) 选项卡,选择所需分支(例如 master),然后选择要检查代码变更的提交。之后您可以复制代码并将其粘贴回 workbook 中。