Advanced configuration(高级配置(Advanced configuration))¶
This section describes how advanced configuration options can be used in Java transforms.
Maximum build duration¶
It may be desirable to limit the run duration of a job to ensure data freshness or to limit costs. For example, if a job is interacting with an external service and becomes unresponsive, it is useful to have a limit on its run duration, as it may not complete.
In Code Repositories, you can limit job duration by using the MaxAllowedDuration and Compute decorators, as shown below:
package myproject.datasets;
import com.palantir.transforms.lang.java.api.*;
import java.time.Duration;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public final class FilterTransform {
@MaxAllowedDuration(value = "PT2H")
@Compute
public void myComputeFunction(
@Input("/examples/students_hair_eye_color") FoundryInput myInput,
@Output("/examples/students_hair_eye_color_filtered") FoundryOutput myOutput) {
Dataset<Row> inputDf = myInput.asDataFrame().read();
myOutput.getDataFrameWriter(inputDf.filter("eye = 'Brown'")).write();
}
}
:::callout{theme="neutral"}
Note that despite the MaxAllowedDuration taking a Duration value, the job is polled every 5 minutes, so a value of PT3M (in ISO 8601 format) will cancel at 5 minutes, and a value of PT7M will cancel at 10 minutes, and so on.
:::
中文翻译¶
高级配置(Advanced configuration)¶
本节介绍如何在Java转换(Java transforms)中使用各类高级配置选项。
最大构建时长(Maximum build duration)¶
你可能需要限制作业的运行时长,以保障数据新鲜度或控制成本。例如,若某个作业在与外部服务交互时出现无响应的情况,由于该作业大概率无法正常完成,此时设置运行时长限制就能发挥作用。
在代码仓库(Code Repositories)中,你可以使用MaxAllowedDuration和Compute装饰器来限制作业时长,示例如下:
package myproject.datasets;
import com.palantir.transforms.lang.java.api.*;
import java.time.Duration;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public final class FilterTransform {
@MaxAllowedDuration(value = "PT2H")
@Compute
public void myComputeFunction(
@Input("/examples/students_hair_eye_color") FoundryInput myInput,
@Output("/examples/students_hair_eye_color_filtered") FoundryOutput myOutput) {
Dataset<Row> inputDf = myInput.asDataFrame().read();
myOutput.getDataFrameWriter(inputDf.filter("eye = 'Brown'")).write();
}
}
:::callout{theme="neutral"}
请注意,尽管MaxAllowedDuration接收的是时长(Duration)类型的参数,但系统每5分钟才会轮询一次作业状态,因此若设置值为PT3M(ISO 8601格式),作业会在第5分钟被取消;若设置值为PT7M,则会在第10分钟被取消,以此类推。
:::