Bring your own container workflows(自带容器工作流)¶
Most computations can be easily expressed with just Python. However, there are use cases where one might want to rely on a non-Python execution engine or script. For instance, it could be an F# application, an ancient VBA script, or even just an end-of-life (EOL) version of Python. The possibilities are limitless when using bring-your-own-container (BYOC) workflows. In short, BYOC enables the creation of a Docker image locally with all the specific dependencies and/or binaries an application requires, and then running a lightweight transform on top of this image.
The following is an example of how to run a COBOL transform in Foundry. For simplicity, this example compiles the COBOL program inside the transform. Alternatively, you may pre-compile the program and copy the binary executable into the Docker image. First, enable containerized workflows then build the Docker image locally following these image requirements.
Consider the following minimal Dockerfile as an example:
FROM ubuntu:latest
RUN apt update && apt install -y coreutils curl sed build-essential gnucobol
RUN useradd --uid 5001 user
USER 5001
Note: The Docker image does not need to have Python installed for lightweight to work.
:::callout{theme="warning"}
Using ubuntu:latest as the base image might result in cURL command failures in IL5 or FedRAMP environments. If that occurs, use ubuntu:jammy instead.
:::
Then, build the image and upload it to Foundry. To do so, create an Artifacts repository and follow these instructions to tag and push our image. In this example, the image is tagged with the name my-image and version 0.0.1. The final step before authoring the BYOC lightweight transform is to add this Artifacts Repository as a local backing repository to the Code Repository.
This following example considers a COBOL script which generates a CSV file, with its source code located at resources/data_generator.cbl inside my Code Repository.
The final step is to write a lightweight transform that allows a connection of the data processing program to Foundry. The following snippet demonstrates how to access the dataset through the Python API while also including arbitrary executables shipped inside the image in question. To invoke the COBOL executable, use the Python standard library's functions (in this case, os.system(...)).
from transforms.api import Output, transform
@transform.using(
my_output=Output('my-output')
).with_container(
container_image="my-image",
container_tag="0.0.1"
)
def compile_cobol_data_generator(my_output):
"""Demonstrate how we can bring dependencies that would be difficult to get through Conda."""
# Compile the Cobol program
# (Everything from the src folder is available in $USER_WORKING_DIR/user_code)
os.system("cobc -x -free -o data_generator $USER_WORKING_DIR/user_code/resources/data_generator.cbl")
# Run the program to create and populate data.csv
os.system('$USER_WORKING_DIR/data_generator')
# Store the results into Foundry
my_output.write_table(pd.read_csv('data.csv'))
:::callout{theme="warning"} Preview is not yet supported for BYOC workflows. :::
Using the Build button will eventually instantiate a container from our Docker image and invoke the commands specified. Resource allocation, logging, communicating with Foundry, enforcing permissions and auditability are all taken care of automatically.
中文翻译¶
自带容器工作流¶
大多数计算任务仅使用Python即可轻松实现。然而,在某些用例中,可能需要依赖非Python的执行引擎或脚本。例如,可能是F#应用程序、古老的VBA脚本,甚至是已停止支持(EOL)的Python版本。使用自带容器(BYOC)工作流时,可能性是无限的。简而言之,BYOC支持在本地创建包含应用程序所需所有特定依赖项和/或二进制文件的Docker镜像,然后在此镜像上运行轻量级转换。
以下是在Foundry中运行COBOL转换的示例。为简单起见,本示例在转换内部编译COBOL程序。或者,您也可以预编译程序并将二进制可执行文件复制到Docker镜像中。首先,启用容器化工作流,然后按照这些镜像要求在本地构建Docker镜像。
以下是一个最小化Dockerfile示例:
FROM ubuntu:latest
RUN apt update && apt install -y coreutils curl sed build-essential gnucobol
RUN useradd --uid 5001 user
USER 5001
注意: Docker镜像无需安装Python即可使轻量级转换正常工作。
:::callout{theme="warning"}
使用ubuntu:latest作为基础镜像可能会导致在IL5或FedRAMP环境中cURL命令失败。如果出现这种情况,请改用ubuntu:jammy。
:::
然后,构建镜像并将其上传到Foundry。为此,创建一个制品仓库,并按照这些说明进行标记和推送。在本示例中,镜像被标记为名称my-image和版本0.0.1。在编写BYOC轻量级转换之前的最后一步是将此制品仓库作为本地支持仓库添加到代码仓库中。
以下示例考虑一个生成CSV文件的COBOL脚本,其源代码位于代码仓库内的resources/data_generator.cbl中。
最后一步是编写一个轻量级转换,将数据处理程序连接到Foundry。以下代码片段演示了如何通过Python API访问数据集,同时包含镜像中提供的任意可执行文件。要调用COBOL可执行文件,请使用Python标准库函数(本例中为os.system(...))。
from transforms.api import Output, transform
@transform.using(
my_output=Output('my-output')
).with_container(
container_image="my-image",
container_tag="0.0.1"
)
def compile_cobol_data_generator(my_output):
"""演示如何引入通过Conda难以获取的依赖项。"""
# 编译COBOL程序
# (src文件夹中的所有内容均可通过$USER_WORKING_DIR/user_code获取)
os.system("cobc -x -free -o data_generator $USER_WORKING_DIR/user_code/resources/data_generator.cbl")
# 运行程序以创建并填充data.csv
os.system('$USER_WORKING_DIR/data_generator')
# 将结果存储到Foundry中
my_output.write_table(pd.read_csv('data.csv'))
:::callout{theme="warning"} BYOC工作流尚不支持预览功能。 :::
使用构建按钮将最终从我们的Docker镜像实例化一个容器,并调用指定的命令。资源分配、日志记录、与Foundry通信、权限执行和审计功能均会自动处理。