Compute usage with Code Repositories（代码仓库(Code Repositories)的计算用量）¶

Running builds in Code Repositories requires the use of Foundry compute, a resource measured in compute-seconds. This documentation details how builds use compute and provides information about investigating and managing compute usage in the product.

When running a transforms build of one or more datasets, Foundry pulls the transform logic into its serverless compute cluster and executes the code. The length and size of the build depends on the complexity of the code, the size of the input and output datasets, and the Spark computation profile set on the code.

The execution of code on input datasets requires Foundry compute (measured in Foundry compute-seconds) when running the parallelized compute and Foundry storage when the outputs of the transformation are written to Foundry Storage. The act of writing code does not incur compute usage; only the building of datasets incurs compute usage.

Measuring Foundry Compute¶

The transforms engine powering Code Repositories uses parallel compute on the backend, most commonly in the Spark scalable computing framework. Transformations in Code Repositories are measured in the total number of Foundry compute-seconds that are used by the job during its runtime. These compute-seconds are measured during the entire duration of the job, which includes the time taken to read from the input datasets, execute the code (including operations such as I/O waits), and writing the output datasets back to Foundry.

You can configure transformations to make use of parallel computation. Compute-seconds are a measure of compute runtime, not wall clock time, and therefore parallel transforms will incur multiple compute-seconds per wall clock second. For a detailed breakdown on how parallelized compute is measured for Foundry compute-seconds for jobs in Code Repositories, review the examples below.

When paying for Foundry usage, the default usage rates are the following:

vCPU / GPU	Usage Rate
vCPU	1
T4 GPU	1.2
V100 GPU	3
A100 GPU	1.3
A10G GPU	1.5
L4 GPU	2.1
H100 GPU	4.7

If you have an enterprise contract with Palantir, contact your Palantir representative before proceeding with compute usage calculations.

Investigating Foundry Compute usage from Code Repositories¶

Usage information can be found in the Resource Management Application, which allows drill-downs on usage metrics.

While builds are the drivers of Foundry Compute usage, that usage is recorded against the long-lived resource to which it is associated. In the case of dataset transformations, the resource is the dataset (or set of datasets) materialized by the job. You can view the timeline of usage on a dataset in the dataset details tab under Resource usage metrics.

Note that for transformations that produce multiple output datasets, the compute usage is equally distributed across all the datasets. For example, if a transformation job creates two datasets, one of which has five rows and the other one five million rows, the number of Foundry compute-seconds will be equally distributed between the two.

Understanding drivers of Foundry Compute usage¶

Unless canceled early, transformations in Code Repositories will run until all logic is run on all data and the outputs are written back to Foundry. The two main factors that affect this runtime are (1) the size of the input data and (2) the complexity of computational operations performed by the transformation logic.

Jobs with larger input data sizes will require more compute than jobs with smaller input data sizes if they have the same logic. For example, a job that performs column processing on 100GB of data will use more Foundry compute-seconds than a job performing the same processing on 10GB of data.
Jobs with that perform more complex operations on data will require more compute than jobs that perform comparatively fewer operations. This is sometimes known as "job complexity.”
As a basic example, consider the number of operations between two mathematical operations, 5 * 5 and 5!. 5 * 5 is a single multiplication operation. 5! is equivalent to 5 * 4 * 3 * 2 *1 (four multiplication operations), which is double the complexity of the 5 * 5 example. As jobs get more complex with tasks such as aggregations, joins, or machine learning algorithms, the number of operations a job must complete on the data can grow.

Managing Foundry Compute usage with Code Repositories¶

For each job, you can review the underlying compute metrics that drove the performance and compute usage of the job. For more details, review Understanding Spark Details.

In a job, Foundry compute-seconds are driven by the size and number of parallelized executors. Both of these settings are fully configurable per job. Review the Spark computation profile documentation for details on how this is set per job. The size of executors is governed by their memory and vCPU counts. Increased vCPU and increased memory per executor will increase the compute-seconds incurred by that executor.

The number of simultaneous tasks is driven by the configured executor counts and their corresponding number of vCPUs. If no configuration overrides are specified, the transformation will use a default Spark profile. Foundry storage for resulting datasets is driven by the size of the dataset being created.

In the end, jobs with different logic can accomplish the same outcome with very different numbers of operations.

Optimizing code to manage usage¶

There are a number of ways to optimize your code and manage the compute-seconds your jobs use. This section provides links to more information about commonly-used optimization techniques.

Spark, the distributed cluster-computing framework adopted by Foundry for most types of code repository batch compute, allows for a variety of optimization techniques. Learn more about optimizing Spark.
In Spark, you can optimize partitioning to expedite builds. The optimal number of partitions depends on the number of rows, the number of columns, the columns’ type, and the content. We recommend an approximate ratio of one partition for every 128 MB of dataset size.
Incremental computation is an efficient method of performing transforms to generate an output dataset. By leveraging the build history of a transform, incremental computation avoids the need to recompute the entire output dataset every time a transform is run.
Learn more about incremental transforms in Python.
Learn more about incremental transforms in Java.
Especially for small-to-medium-sized datasets, there are several compute engines other than Spark that consistently surpass Spark in performance in benchmarks for single-node applications. Consequently, using these alternatives for running your pipelines can lead to increased processing speed and reduced compute consumption. To fully capitalize on these options, we advise becoming acquainted with Lightweight transforms.
If your builds are orchestrated using Schedules, we recommend reading our scheduling best practices to optimize costs.

Calculating Foundry Compute usage¶

Usage example 1: Standard memory¶

This example demonstrates how Foundry Compute is measured for a statically-allocated job with a standard memory request.

Driver profile:
    vCPUs: 1
    GiB_RAM: 6
Executor profile:
    vCPUs: 1
    GiB_RAM: 6
    Count: 4
Total Job Wall-Clock Runtime: 
    120 seconds



Calculation
driver_compute_seconds = max(num_vcpu, GiB_RAM/7.5) * num_seconds
                       = max(1vcpu, 6gib/7.5gib) * 120sec
                       = 120 compute-seconds

executor_compute_seconds = num_executors * max(num_vcpu, GiB_RAM/7.5) * num_seconds 

                         = 4 * max(1, 6/7.5) * 120sec 
                         = 480 compute-seconds

total_compute_seconds = 120 + 480 = 600 compute-seconds

Usage example 2: Large memory¶

This example demonstrates how Foundry Compute is measured for a statically-allocated job with a larger memory request.

Driver Profile:
    vCPUs: 2
    GiB_RAM: 6
Executor profile:
    vCPUs: 1
    GiB_RAM: 15
    Count: 4
Total Job Wall-Clock Runtime: 
    120 seconds



Calculation:
driver_compute_seconds = max(num_vcpu, GiB_RAM/7.5) * num_seconds
                       = max(2vcpu, 6gib/7.5gib) * 120sec
                       = 240 compute-seconds

executor_compute_seconds = num_executors * max(num_vcpu, GiB_RAM/7.5) * num_seconds 
                         = 4 * max(1, 15/7.5) * 120sec 
                         = 960 compute-seconds

total_compute_seconds = driver_compute_seconds + executor_compute_seconds
                      = 240 + 960 = 1200 compute-seconds

Usage example 3: Dynamic executor counts¶

This example demonstrates how Foundry Compute is measured for a dynamically-allocated job, where some job execution time is performed with two executors, and the rest of the job time is performed with four executors.

Driver Profile:
    vCPUs: 2
    GiB_RAM: 6
Executor profile:
    vCPUs: 1
    GiB_RAM: 6
    Count: 
        min: 2
        max: 4
Total Job Wall-Clock Runtime: 
    120 seconds:
        2 executors: 60 seconds
        4 executors: 60 seconds



Calculation:
driver_compute_seconds = max(num_vcpu, GiB_RAM/7.5) * num_seconds
                       = max(2vcpu, 6gib/7.5gib) * 120sec
                       = 240 compute-seconds

# Calculate compute seconds for job time with 2 executors
2_executor_compute_seconds = num_executors * max(num_vcpu, GiB_RAM/7.5) * num_seconds 
                           = 2 * max(1, 6/7.5) * 60sec 
                           = 120 compute-seconds 

# Calculate compute seconds for job time with 4 executors
4_executor_compute_seconds = num_executors * max(num_vcpu, GiB_RAM/7.5) * num_seconds 
                           = 4 * max(1, 6/7.5) * 60sec 
                           = 240 compute-seconds


total_compute_seconds = driver_compute_seconds + 2_executor_compute_seconds + 4_executor_compute_seconds
                      = 240 + 120 + 240 = 600 compute-seconds

Usage example 4: GPU compute¶

This example demonstrates how Foundry GPU Compute is measured for a statically-allocated job.

Driver profile:
    T4 GPU: 1
Executor profile:
    T4 GPU: 1
    Count: 4
Total Job Wall-Clock Runtime: 
    120 seconds


Calculation:
driver_compute_seconds = num_gpu * gpu_usage_rate * num_seconds
                       = 1gpu * 1.2 * 120sec
                       = 144 compute-seconds

executor_compute_seconds = num_executors * num_gpu * gpu_usage_rate * num_seconds 
                         = 4 * 1 * 1.2 * 120sec 
                         = 576 compute-seconds

total_compute_seconds = 144 + 576 = 720 compute-seconds

中文翻译¶

代码仓库(Code Repositories)的计算用量¶

在代码仓库(Code Repositories)中运行构建需要使用Foundry计算资源，该资源以计算秒(compute-seconds)为单位进行计量。本文档详细说明了构建如何使用计算资源，并提供了在产品中调查和管理计算用量的相关信息。

当运行一个或多个数据集(datasets)的转换构建(transforms build)时，Foundry会将转换逻辑拉入其无服务器计算集群并执行代码。构建的时长和规模取决于代码的复杂度、输入和输出数据集的大小，以及代码上设置的Spark计算配置文件(Spark computation profile)。

在输入数据集上执行代码时，运行并行化计算需要消耗Foundry计算资源（以Foundry计算秒(Foundry compute-seconds)计量），而转换结果写入Foundry存储(Foundry Storage)时则需要消耗Foundry存储资源。编写代码本身不会产生计算用量，只有构建数据集才会产生计算用量。

计量Foundry计算资源¶

为代码仓库(Code Repositories)提供动力的转换引擎在后端使用并行计算，最常见的是在Spark可扩展计算框架中。代码仓库(Code Repositories)中的转换以作业在运行期间使用的Foundry计算秒(compute-seconds)总数来计量。这些计算秒在作业的整个持续时间内进行计量，包括从输入数据集读取数据、执行代码（包括I/O等待等操作）以及将输出数据集写回Foundry所花费的时间。

您可以配置转换以利用并行计算。计算秒(compute-seconds)是计算运行时间的度量，而非挂钟时间(wall clock time)，因此并行转换每挂钟秒会产生多个计算秒。有关代码仓库(Code Repositories)中作业的Foundry计算秒(compute-seconds)如何计量并行化计算的详细说明，请查看下面的示例。

在支付Foundry用量时，默认用量费率如下：

vCPU / GPU	用量费率(Usage Rate)
vCPU	1
T4 GPU	1.2
V100 GPU	3
A100 GPU	1.3
A10G GPU	1.5
L4 GPU	2.1
H100 GPU	4.7

如果您与Palantir签订了企业合同，请在进行计算用量计算之前联系您的Palantir代表。

从代码仓库(Code Repositories)调查Foundry计算用量¶

用量信息可在资源管理应用程序(Resource Management Application)中找到，该应用程序支持对用量指标进行下钻分析。

虽然构建是Foundry计算用量的驱动因素，但该用量会记录到与其关联的长期资源上。对于数据集转换，该资源是由作业物化的数据集（或数据集集合）。您可以在数据集的资源用量指标(Resource usage metrics)下的数据集详情选项卡中查看用量时间线。

请注意，对于生成多个输出数据集的转换，计算用量会在所有数据集之间平均分配。例如，如果一个转换作业创建了两个数据集，其中一个有五行，另一个有五百万行，则Foundry计算秒(compute-seconds)将在两者之间平均分配。

理解Foundry计算用量的驱动因素¶

除非提前取消，否则代码仓库(Code Repositories)中的转换将持续运行，直到所有逻辑在所有数据上执行完毕，并将输出写回Foundry。影响此运行时间的两个主要因素是：(1) 输入数据的大小和 (2) 转换逻辑执行的计算操作的复杂度。

如果逻辑相同，输入数据量较大的作业将比输入数据量较小的作业需要更多的计算资源。例如，对100GB数据执行列处理的作业将比对10GB数据执行相同处理的作业使用更多的Foundry计算秒(compute-seconds)。
对数据执行更复杂操作的作业将比执行相对较少操作的作业需要更多的计算资源。这有时被称为"作业复杂度(job complexity)"。
举个基本例子，考虑两个数学运算5 * 5和5!之间的操作数量。5 * 5是一个乘法运算。5!相当于5 * 4 * 3 * 2 *1（四个乘法运算），其复杂度是5 * 5示例的两倍。随着作业因聚合、连接或机器学习算法等任务而变得更加复杂，作业必须在数据上完成的操作数量可能会增长。

使用代码仓库(Code Repositories)管理Foundry计算用量¶

对于每个作业，您可以查看驱动作业性能和计算用量的底层计算指标。更多详情，请查看理解Spark详情(Understanding Spark Details)。

在一个作业中，Foundry计算秒(Foundry compute-seconds)由并行化执行器(executors)的大小和数量驱动。这两个设置都可以为每个作业完全配置。请查看Spark计算配置文件(Spark computation profile)文档，了解如何为每个作业进行设置。执行器的大小由其内存和vCPU数量决定。增加每个执行器的vCPU和内存将增加该执行器产生的计算秒数。

同时任务的数量由配置的执行器数量及其相应的vCPU数量驱动。如果未指定配置覆盖，转换将使用默认的Spark配置文件。结果数据集的Foundry存储由正在创建的数据集的大小决定。

最终，具有不同逻辑的作业可以通过截然不同的操作数量实现相同的结果。

优化代码以管理用量¶

有多种方法可以优化代码并管理作业使用的计算秒数。本节提供了常用优化技术的更多信息链接。

Spark是Foundry为大多数类型的代码仓库批量计算所采用的分布式集群计算框架，它支持多种优化技术。了解更多关于优化Spark的信息。
在Spark中，您可以优化分区以加快构建速度。最佳分区数取决于行数、列数、列类型和内容。我们建议数据集大小每128 MB大约对应一个分区。
增量计算(incremental computation)是一种执行转换以生成输出数据集的高效方法。通过利用转换的构建历史，增量计算避免了每次运行转换时都需要重新计算整个输出数据集。
了解更多关于Python中增量转换的信息。
了解更多关于Java中增量转换的信息。
特别是对于中小型数据集，有几种计算引擎在单节点应用的基准测试中持续超越Spark的性能。因此，使用这些替代方案来运行您的流水线可以提高处理速度并减少计算消耗。为了充分利用这些选项，我们建议熟悉轻量级转换(Lightweight transforms)。
如果您的构建是使用调度(Schedules)编排的，我们建议阅读我们的调度最佳实践(scheduling best practices)以优化成本。

计算Foundry计算用量¶

用量示例1：标准内存¶

此示例演示了如何为具有标准内存请求的静态分配作业计量Foundry计算资源。

驱动程序配置文件(Driver profile):
    vCPU: 1
    GiB_RAM: 6
执行器配置文件(Executor profile):
    vCPU: 1
    GiB_RAM: 6
    数量(Count): 4
作业总挂钟运行时间(Total Job Wall-Clock Runtime): 
    120 秒



计算(Calculation)
驱动程序计算秒(driver_compute_seconds) = max(vCPU数量, GiB_RAM/7.5) * 秒数
                       = max(1vCPU, 6GiB/7.5GiB) * 120秒
                       = 120 计算秒(compute-seconds)

执行器计算秒(executor_compute_seconds) = 执行器数量 * max(vCPU数量, GiB_RAM/7.5) * 秒数 

                         = 4 * max(1, 6/7.5) * 120秒 
                         = 480 计算秒(compute-seconds)

总计算秒(total_compute_seconds) = 120 + 480 = 600 计算秒(compute-seconds)

用量示例2：大内存¶

此示例演示了如何为具有较大内存请求的静态分配作业计量Foundry计算资源。

驱动程序配置文件(Driver Profile):
    vCPU: 2
    GiB_RAM: 6
执行器配置文件(Executor profile):
    vCPU: 1
    GiB_RAM: 15
    数量(Count): 4
作业总挂钟运行时间(Total Job Wall-Clock Runtime): 
    120 秒



计算(Calculation):
驱动程序计算秒(driver_compute_seconds) = max(vCPU数量, GiB_RAM/7.5) * 秒数
                       = max(2vCPU, 6GiB/7.5GiB) * 120秒
                       = 240 计算秒(compute-seconds)

执行器计算秒(executor_compute_seconds) = 执行器数量 * max(vCPU数量, GiB_RAM/7.5) * 秒数 
                         = 4 * max(1, 15/7.5) * 120秒 
                         = 960 计算秒(compute-seconds)

总计算秒(total_compute_seconds) = 驱动程序计算秒 + 执行器计算秒
                      = 240 + 960 = 1200 计算秒(compute-seconds)

用量示例3：动态执行器数量¶

此示例演示了如何为动态分配作业计量Foundry计算资源，其中部分作业执行时间使用两个执行器，其余作业时间使用四个执行器。

驱动程序配置文件(Driver Profile):
    vCPU: 2
    GiB_RAM: 6
执行器配置文件(Executor profile):
    vCPU: 1
    GiB_RAM: 6
    数量(Count): 
        最小值(min): 2
        最大值(max): 4
作业总挂钟运行时间(Total Job Wall-Clock Runtime): 
    120 秒:
        2个执行器: 60 秒
        4个执行器: 60 秒



计算(Calculation):
驱动程序计算秒(driver_compute_seconds) = max(vCPU数量, GiB_RAM/7.5) * 秒数
                       = max(2vCPU, 6GiB/7.5GiB) * 120秒
                       = 240 计算秒(compute-seconds)

# 计算2个执行器时的作业时间计算秒
2个执行器计算秒(2_executor_compute_seconds) = 执行器数量 * max(vCPU数量, GiB_RAM/7.5) * 秒数 
                           = 2 * max(1, 6/7.5) * 60秒 
                           = 120 计算秒(compute-seconds) 

# 计算4个执行器时的作业时间计算秒
4个执行器计算秒(4_executor_compute_seconds) = 执行器数量 * max(vCPU数量, GiB_RAM/7.5) * 秒数 
                           = 4 * max(1, 6/7.5) * 60秒 
                           = 240 计算秒(compute-seconds)


总计算秒(total_compute_seconds) = 驱动程序计算秒 + 2个执行器计算秒 + 4个执行器计算秒
                      = 240 + 120 + 240 = 600 计算秒(compute-seconds)

用量示例4：GPU计算¶

此示例演示了如何为静态分配作业计量Foundry GPU计算资源。

驱动程序配置文件(Driver profile):
    T4 GPU: 1
执行器配置文件(Executor profile):
    T4 GPU: 1
    数量(Count): 4
作业总挂钟运行时间(Total Job Wall-Clock Runtime): 
    120 秒


计算(Calculation):
驱动程序计算秒(driver_compute_seconds) = GPU数量 * GPU用量费率 * 秒数
                       = 1GPU * 1.2 * 120秒
                       = 144 计算秒(compute-seconds)

执行器计算秒(executor_compute_seconds) = 执行器数量 * GPU数量 * GPU用量费率 * 秒数 
                         = 4 * 1 * 1.2 * 120秒 
                         = 576 计算秒(compute-seconds)

总计算秒(total_compute_seconds) = 144 + 576 = 720 计算秒(compute-seconds)