跳转至

Migrating from HyperAuto V1 to V2(从 HyperAuto V1 迁移到 V2)

:::callout{theme="warning" title="Sunset"} HyperAuto V1 is in the sunset phase of development and will be deprecated at a future date. Full support remains available. The creation of new V1 pipelines is discouraged, and users should migrate from HyperAuto V1 to V2 as detailed in the migration documentation. :::

HyperAuto V2 is a significant upgrade from HyperAuto V1 and offers enhanced performance and functionality, including:

  • An easier configuration process using a point-and-click wizard.
  • The ability to generate Pipeline Builder pipelines, which provides full transparency on how the data is being processed, a comprehensive change management workflow, and significant performance improvements.
  • Real-time streaming of input data, enabling time critical operational applications.

Notable feature differences between HyperAuto V1 and V2

Significant HyperAuto V2 updates and changes are described below.

Source-type support

As of 29 April 2024, HyperAuto V2 only supports SAP data. Users of V1 with SAP data are strongly encouraged to start migrating their usage to V2 (see Getting started).

Multi-source support

In HyperAuto V1, users could connect a single pipeline to multiple sources and perform a wide union at the end. However, this approach could produce unexpected results and is now discouraged. In particular, primary and foreign keys which were using the source as a prefix could break if a source name change occurred.

In HyperAuto V2, each pipeline can only be connected to one source. As a consequence, the source column is not produced in output datasets and is no longer used as a prefix in primary or foreign keys generation.

Users requiring this feature are encouraged to re-implement a pipeline performing a union downstream of HyperAuto V2.

Foreign key generation

HyperAuto V1 implemented a permissive approach to foreign key generation, which often resulted in the creation of foreign keys between tables that did not accurately reflect the underlying data relationships, leading to potential inaccuracies and misleading interpretations.

The logic to generate keys in HyperAuto V2 has been updated to use a more conservative approach to improve accuracy; as such, the list of foreign key columns is different. If you believe that a foreign key has been mistakenly omitted in V2, contact your Palantir representative.

Column renaming

HyperAuto V2 uses richer metadata to rename columns, which may generate different column names in output datasets compared to HyperAuto V1.

Custom cleaning functions

HyperAuto V2 does not support the implementation of custom cleaning functions to be applied as part of the pipeline. Users are advised to create a pipeline downstream of HyperAuto to implement their custom logic.

Batch unioning of inputs

HyperAuto V2 does not support the configuration of multiple syncs linking to the same output table (known as batch union components in V1). Users are advised to union their inputs prior to HyperAuto V2, and then configure a folder-based pipeline to consume from HyperAuto.

Migrating existing HyperAuto V1 pipelines to HyperAuto V2

Users are encouraged to gradually migrate their pipelines from HyperAuto V1 to V2 by:

  1. Creating a new HyperAuto V2 pipeline that replicates existing V1 configurations and consumes the same inputs.
  2. Identifying downstream consumers of the pipeline (repositories, analyses, applications) and gradually pointing them to the new HyperAuto V2 outputs.

In cases when a decision has been made to not migrate to HyperAuto V2, existing V1 repositories will be left intact but “severed” from the original template. This means that the repository will be converted to a regular Python Transforms repository and will be owned by users just like any other custom repository.

:::callout{theme="neutral"} After severing a HyperAuto V1 repository from the original template, the automatic pull request creation process will be discontinued, and users will have to manually create pull requests to update their V1 configurations. :::


中文翻译

从 HyperAuto V1 迁移到 V2

:::callout{theme="warning" title="生命周期终止"} HyperAuto V1 正处于开发生命周期终止(sunset)阶段,将在未来某个日期被弃用。目前仍提供全面支持。不建议创建新的 V1 管道,用户应按照迁移文档中的详细说明,从 HyperAuto V1 迁移到 V2。 :::

HyperAuto V2 是 HyperAuto V1 的重大升级,提供了增强的性能和功能,包括:

  • 使用点击式向导(point-and-click wizard)实现更简便的配置流程。
  • 能够生成管道构建器(Pipeline Builder)管道,从而完全透明地展示数据处理方式、提供全面的变更管理工作流程以及显著的性能改进。
  • 输入数据的实时流式传输(Real-time streaming),支持对时间要求严格的操作型应用。

HyperAuto V1 和 V2 之间的显著功能差异

以下描述了 HyperAuto V2 的重要更新和变更。

源类型支持

自 2024 年 4 月 29 日起,HyperAuto V2 仅支持 SAP 数据。强烈建议使用 SAP 数据的 V1 用户开始迁移到 V2(请参阅入门指南)。

多源支持

在 HyperAuto V1 中,用户可以将单个管道连接到多个源,并在最后执行宽联合(wide union)。然而,这种方法可能会产生意外结果,现已不推荐使用。特别是,如果源名称发生更改,使用 source 作为前缀的主键和外键可能会失效。

在 HyperAuto V2 中,每个管道只能连接到一个源。因此,输出数据集中不会生成 source 列,并且在主键或外键生成中也不再将其用作前缀。

需要此功能的用户建议在 HyperAuto V2 下游重新实现一个执行联合操作的管道。

外键生成

HyperAuto V1 对外键生成采用了宽松的方法,这通常导致在表之间创建的外键无法准确反映底层数据关系,从而可能产生不准确和误导性的解读。

HyperAuto V2 中的键生成逻辑已更新,采用更保守的方法以提高准确性;因此,外键列的列表有所不同。如果您认为 V2 中错误地遗漏了某个外键,请联系您的 Palantir 代表。

列重命名

HyperAuto V2 使用更丰富的元数据来重命名列,与 HyperAuto V1 相比,可能会在输出数据集中生成不同的列名。

自定义清洗函数

HyperAuto V2 不支持在管道中应用自定义清洗函数。建议用户在 HyperAuto 下游创建一个管道来实现自定义逻辑。

输入的批量联合

HyperAuto V2 不支持配置链接到同一输出表的多个同步(在 V1 中称为批量联合组件(batch union components))。建议用户在 HyperAuto V2 之前联合其输入,然后配置一个基于文件夹的管道(folder-based pipeline)来从 HyperAuto 消费数据。

将现有 HyperAuto V1 管道迁移到 HyperAuto V2

建议用户通过以下步骤逐步将管道从 HyperAuto V1 迁移到 V2:

  1. 创建一个新的 HyperAuto V2 管道,复制现有的 V1 配置并消费相同的输入。
  2. 识别管道的下游消费者(代码仓库、分析、应用),并逐步将它们指向新的 HyperAuto V2 输出。

如果决定迁移到 HyperAuto V2,现有的 V1 代码仓库将保持原样,但会从原始模板中"分离(severed)"。这意味着该代码仓库将转换为常规的 Python Transforms 代码仓库,并由用户像任何其他自定义代码仓库一样拥有。

:::callout{theme="neutral"} 将 HyperAuto V1 代码仓库从原始模板分离后,自动拉取请求创建流程(automatic pull request creation process)将停止,用户需要手动创建拉取请求来更新其 V1 配置。 :::