HyperAuto V1 FAQ(HyperAuto V1 常见问题解答)¶
:::callout{theme="warning" title="Sunset"} HyperAuto V1 is in the sunset phase of development and will be deprecated at a future date. Full support remains available. The creation of new V1 pipelines is discouraged, and users should migrate from HyperAuto V1 to V2 as detailed in the migration documentation. :::
General usage tips & guidance¶
- Can I debug and preview code in an SDDI repository?
- Can I configure a schedule to which new tables will be automatically added?
- One of my tables / derived_element is failing due to
MODULE_UNREACHABLE, what should I do? - I added table
<TABLE_NAME>to my pipeline, but when I try to build my pipeline it is failing withAssertionError: 0 instances of <TABLE_NAME> found in 'objects' metadata table - Do I need to increase semantic version if I add new tables to Bellhop config files?
- Can I disable some of the intermediate stages generated by an SDDI repositiory?
Can I debug and preview code in an SDDI repository?¶
Yes, you can debug and preview code in an SDDI repository. In the SDDI repository, navigate to the file /transforms-bellhop/src/software_defined_data_integrations/transforms/pipeline_builder.py and select the transform you want to preview from the Preview button.
Can I configure a schedule to which new tables will be automatically added?¶
An SDDI repository produces a dataset called BUILD that is connected to all final datasets produced by the repository. In order to guarantee that all newly-ingested tables get built, create a new Full Build schedule (including upstream datasets) with this BUILD dataset as a target. The smart scheduler will only initiate builds for the parts of the pipeline where the raw data had been refreshed.
One of my tables / derived_element is failing due to MODULE_UNREACHABLE, what should I do?¶
MODULE_UNREACHABLE is often a sign that DRIVER_MEMORY in your Spark environment is insufficient. You can apply Spark profiles in your SourceConfig.yaml file for selected tables; see the configuration reference for details. Do not forget to import the assigned profile to your repository config first.
I added table <TABLE_NAME> to my pipeline, but when I try to build my pipeline it is failing with AssertionError: 0 instances of <TABLE_NAME> found in 'objects' metadata table¶
Make sure you have rerun metadata datasets objects, links, fields and diffs after new tables are ingested and added to your SDDI pipeline.
Do I need to increase semantic version if I add new tables to Bellhop config files?¶
No, you do not need to increase semantic version after adding new tables to Bellhop config files. However, you will need to rebuild metadata datasets objects, links, fields, and diffs.
Can I disable some of the intermediate stages generated by an SDDI repositiory?¶
Yes. The foreign key generation, enrichment stage, and renaming stage can be disabled using parameters in the PipelineConfig file. Incrementing the deploymentSemanticVersion is required for changes to take effect.
:::callout{theme="warning"} Disabling any or all of those steps will result in data schema consequences and may cause breaks in downstream usage of the data. :::
中文翻译¶
HyperAuto V1 常见问题解答¶
:::callout{theme="warning" title="停用公告"} HyperAuto V1 已进入开发周期的停用阶段,将在未来某个日期被弃用。目前仍提供全面支持。不建议创建新的 V1 管道,用户应按照迁移文档中的说明,从 HyperAuto V1 迁移至 V2。 :::
通用使用技巧与指南¶
- 我能否在 SDDI 仓库中调试和预览代码?
- 我能否配置一个会自动添加新表的调度?
- 我的某个表/derived_element 因
MODULE_UNREACHABLE而失败,该怎么办? - 我将表
<TABLE_NAME>添加到管道中,但尝试构建时失败,报错AssertionError: 0 instances of <TABLE_NAME> found in 'objects' metadata table - 向 Bellhop 配置文件添加新表时,是否需要增加语义版本号?
- 我能否禁用 SDDI 仓库生成的某些中间阶段?
我能否在 SDDI 仓库中调试和预览代码?¶
可以,您可以在 SDDI 仓库中调试和预览代码。在 SDDI 仓库中,导航至文件 /transforms-bellhop/src/software_defined_data_integrations/transforms/pipeline_builder.py,然后通过预览按钮选择要预览的转换(transform)。
我能否配置一个会自动添加新表的调度?¶
SDDI 仓库会生成一个名为 BUILD 的数据集,该数据集与仓库生成的所有最终数据集相连。为确保所有新接入的表都能被构建,请创建一个以该 BUILD 数据集为目标的完整构建调度(包括上游数据集)。智能调度器(smart scheduler)将仅对原始数据已刷新的管道部分启动构建。
我的某个表/derived_element 因 MODULE_UNREACHABLE 而失败,该怎么办?¶
MODULE_UNREACHABLE 通常表明 Spark 环境中的 DRIVER_MEMORY 不足。您可以在 SourceConfig.yaml 文件中为选定的表应用 Spark 配置文件;详情请参阅配置参考。请勿忘记先将分配的配置文件导入到您的仓库配置中。
我将表 <TABLE_NAME> 添加到管道中,但尝试构建时失败,报错 AssertionError: 0 instances of <TABLE_NAME> found in 'objects' metadata table¶
请确保在新表被接入并添加到 SDDI 管道后,重新运行元数据集 objects、links、fields 和 diffs。
向 Bellhop 配置文件添加新表时,是否需要增加语义版本号?¶
不需要,向 Bellhop 配置文件添加新表后无需增加语义版本号。但您需要重新构建元数据集 objects、links、fields 和 diffs。
我能否禁用 SDDI 仓库生成的某些中间阶段?¶
可以。外键生成、富化阶段(enrichment stage)和重命名阶段可以通过 PipelineConfig 文件中的参数进行禁用。要使更改生效,需要递增 deploymentSemanticVersion。
:::callout{theme="warning"} 禁用其中任何或所有步骤将导致数据模式(data schema)发生变化,并可能造成下游数据使用中断。 :::