Create time series data with Pipeline Builder（使用 Pipeline Builder 创建时间序列数据）¶

The pipeline you will create with this guide will generate time series data that backs a time series sync to associate with time series properties on the Carrier, Route, and Airport object types to create new time series objects. This pipeline involves a more complex set up than standard mappings from time series datasets to time series properties, as we will make calculations on non-time series data to generate time series data. Review our Pipeline Builder documentation for information on general pipeline guidance.

The flight dataset we are working with includes the following columns that we will use to create time series properties:

flight_id: string | A unique string to identify the flight and every row in the dataset.
date: date | The date the flight took place.
destination_airport_id: string | The string to identify the destination airport.
airline_id: string | The ID of the airline carrier.
origin_airport_id: string | The ID of the origin airport.
dep_delay: integer | The number of minutes the departure was delayed.
arr_delay: integer | The number of minutes the arrival was delayed.
route_id: string| A unique string to identify the route.

The pipeline at the end of this guide will look like this:

Overview of final time series pipeline

Part I: Generate time series data¶

Using the same flights dataset that is used to back the Flightsobject type, we can perform some aggregation transformations and generate time series data based on flight metrics. Note that this step is not necessary if you have time series data coming into Foundry from a historian or edge sensor. You can move on to generate a time series sync.

1. Apply transforms to `Carrier` and `Route` object types¶

From the flights dataset, apply transforms using the steps below. You will do this for both the Carrier and Route object types.

Aggregate the data¶

Use the aggregate transform to group by date and ID of the object, (in this case, using the route_id for the Route object type; you will separately need to do the same using airline_id for the Carrier object type), and calculate average arrival delays, average departure delays, and daily flight counts.

The aggregate transform board in Pipeline Builder.

After aggregating, the dataset should preview with the following schema:

route_id	date	daily_avg_dep_delay	daily_avg_arr_delay	daily_count_of_flights
ATL -> SFO	2023-06-12	33.4545454545450000	40.0000000000000000	11
ATL -> FLL	2023-08-24	29.7272727272720000	19.4090909090909100	22
ATL -> TVC	2023-07-05	-8.0000000000000000	-8.0000000000000000	1

Cast to a new data type¶

To use this new data as a time series, we must create a timestamp type column. To do this, use the cast transform to cast the date column to a timestamp type column. We will also soon apply an unpivot transform to merge daily_avg_dep_delay, daily_avg_arr_delay, and daily_count_of_flights values into one column. Since this function requires that all values be of the same data type, we must also cast our daily count of flights metric to a double type (the same data type as the average delay metrics).

The cast board in Pipeline Builder, showing a cast to timestamp and a cast to double

Unpivot to merge time series values¶

Since this dataset contains time series data in different columns, we must use an unpivot transform to merge it into one value column so the data can match the required schema for a time series sync, as shown below:

series ID: string | The series ID for the set of timestamp and value pairs referred to by a TSP, which must match the TSP's series ID.
timestamp: timestamp or long | The time at which the quantity is measured.
value: integer, float, double, string | The value of the quantity at the point that it is measured. A string type indicates a categorical time series; each categorical time series can have, at most, 10,000 unique variants.

The unpivot transform shown below places values for daily_avg_dep_delay, daily_avg_arr_delay, and daily_count_of_flights into the same series_value column. Those original column names are outputs to the new series_name column that will be used in the series_id.

The unpivot transform board in Pipeline Builder

The dataset schema should now appear as follows:

series_name	series_value	route_id	date
daily_avg_dep_delay	33.4545454545450000	ATL -> SFO	2023-06-12T00:00:00.000Z
daily_avg_arr_delay	40.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z
daily_count_of_flights	11.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z

Concatenate string values to create the series ID¶

Now, we can use the concatenate strings transform to create the series ID (the identifier for the associated time series values). Use the transform to combine the series_name (what each sensor represents) with the primary key of each object.

The concatenate strings transform board in Pipeline Builder.

series_id	series_name	series_value	route_id	date
CMH -> IAH_daily_avg_dep_delay	daily_avg_dep_delay	33.4545454545450000	ATL -> SFO	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_avg_arr_delay	daily_avg_arr_delay	40.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_count_of_flights	daily_count_of_flights	11.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z

Select necessary columns¶

Using the select columns transform, we will only keep the columns that are required for the time series sync: series_id, series_value, and date. The flights backing dataset will hold time series values for all series, regardless of what they are measuring. Repeat this for the airline_carrier_id column (from the flights dataset).

The select columns transform board in Pipeline Builder

series_id	series_value	date
CMH -> IAH_daily_avg_dep_delay	33.4545454545450000	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_avg_arr_delay	40.0000000000000000	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_count_of_flights	11.0000000000000000	2023-06-12T00:00:00.000Z

2. Add a transform to aggregate and generate data for origin and destination airports¶

Now, you must repeat the aggregate and cast transform steps for both origin airports and destination airports.

Aggregate for the number of flights per day per route¶

Use the aggregate transform to group by date and origin_airport_id, then calculate the average arrival and departure times. The total number of rows in each group is equivalent to the number of flights per day per route.

The aggregate transform board in Pipeline Builder, used to aggregate the number of flights per day

date	origin_airport_id	daily_avg_dep_delay	daily_count_of_departing_flights
2023-07-02	10299	9.34375000000000000	33
2023-09-06	10431	-2.3333333333333333	6
2023-01-12	10620	-7.0000000000000000	2

Cast to timestamp¶

To use this new data as a time series, we must create a timestamp column. To do this, use the cast transform to cast the date column to a timestamp type column.

The cast transform board in Pipeline Builder, used to cast the data to a timestamp type

3. Create a join to combine destination and origin airports¶

Using the join board, create a left join that combines data from the destination airport and origin airport, resulting in complete time series properties for airport data. Be sure the following configuration are set for your join:

Match the date and origin_airport_id to the dest_airport_id.
Auto-select columns from the left dataset.
As the right columns, select the two that represent daily average delay and the daily count of flights.

The join board in Pipeline Builder, configured to create a left join for airport data.

date	origin_airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

4. Apply transforms to format data for a time series sync¶

Rename column¶

Now that we joined the origin airport data with the destination airport data, we have both arrival and departure metrics for all airports. We no longer need to differentiate origin from destination, so we can use the rename columns transform to change origin_airport_id to simply airport_id.

The rename column transform board in Pipeline Builder

The data should preview as follows with the renamed column:

date	airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

Cast to double¶

We will soon apply an unpivot transform. This function requires that all values be of the same data type, so we must use the cast transform board again to cast our daily count of flights metrics to a double data type so they are the same type as the average delay metrics.

Add flight numbers¶

To calculate the full daily flight count, we will use the add numbers transform to add together the daily count of arriving flights and the daily count of departing flights, as shown below.

The cast and add numbers transform boards in Pipeline Builder

daily_count_of_flights	date	airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
77	2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
12	2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
4	2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

Unpivot to merge series values¶

Since this dataset contains time series data in different columns, we must use an unpivot transform to merge it into one value column so the data can match the required schema for a time series sync, as shown below:

series ID: string | The series ID for the set of timestamp and value pairs referred to by a TSP, which must match the TSP's series ID.
timestamp: timestamp or long | The time at which the quantity is measured.
value: integer, float, double, string | The value of the quantity at the point that it is measured. A string type indicates a categorical time series; each categorical time series can have, at most, 10,000 unique variants.

The unpivot transform shown below places values for daily_avg_dep_delay, daily_avg_arr_delay, and daily_count_of_flights into the same series_value column. Those original column names are outputs to the new series_name column that will be used in the series ID.

The unpivot transform board in Pipeline Builder, configured to create a series_name column output

The data should preview with the following schema:

series_name	series_value	date	airport_id
daily_count_of_flights	77	2023-07-02T00:00:00.000Z	10299
daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z	10299
daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z	10299

Concatenate string values to create the series ID¶

Now, we can use the concatenate strings transform to create the series ID (the identifier for the associated time series values). Use the transform to combine the series_name (what each sensor represents) with the primary key of the Airport object (airport_id).

The concatenate strings board in Pipeline Builder, configured to combine the series_name with the airport_id

series_id	series_name	series_value	date	airport_id
12099_daily_count_of_flights	daily_count_of_flights	77	2023-07-02T00:00:00.000Z	10299
12099_daily_avg_dep_delay	daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z	10299
12099_daily_avg_arr_delay	daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z	10299

Select necessary columns¶

Using the select columns transform, we will only keep the columns that are required for the time series sync: series_id, series_value, and date. The flights backing dataset will hold time series values for all series, regardless of what they are measuring.

The select columns board in Pipeline Builder

The resulting dataset should look as follows:

series_id	series_value	date
12099_daily_count_of_flights	77	2023-07-02T00:00:00.000Z
12099_daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z
12099_daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z

5. Union the time series properties into a backing dataset¶

Create a union with the type Union by name, using the transforms representing the Carrier, Route, and Airport time series properties.

Three time series properties nodes are selected for unioning from the Pipeline Builder graph

The union board in Pipeline Builder, configured to union the three TSP property sets by name

series_id	series_value	date
12099_daily_count_of_flights	77	2023-07-02T00:00:00.000Z
12099_daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z
12099_daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z
CMH -> IAH_daily_avg_dep_delay	-8.0000000000000000	2023-03-21T00:00:00.000Z
20304_daily_avg_arr_delay	9.12500000000000000	2023-08-13T00:00:00.000Z

Part II: Create the time series sync¶

1. Remove null values¶

Apply a filter transform on the resulting dataset to remove any null values.

The filter transform board in Pipeline Builder, configured to remove null values.

2. Configure the time series sync¶

Now, create a time series sync by selecting Add from the pipeline output section to the right of the screen. Then, select Time series sync. Fill out the necessary data for the new time series sync, with the following considerations:

The title “[Example] Time series sync | Events” will correspond to the resulting resource in your Palantir filesystem folder.
Select the series_id column for the Series ID field.
Add the created date timestamp column in the Time field.
Add series_value to the Value field.

Now, save and build the pipeline. The output will be created in the same folder as the pipeline.

3. Use a time series sync to add properties to object types¶

Now that you created a pipeline with a time series sync, you are ready to use the sync to add time series properties to the Route, Carrier and Airport object types. Move on to our documentation for adding time series properties to object types for more guidance.

中文翻译¶

使用 Pipeline Builder 创建时间序列数据¶

按照本指南创建的管道将生成时间序列数据，该数据用于支持时间序列同步，并与 Carrier、Route 和 Airport 对象类型上的时间序列属性关联，以创建新的时间序列对象。与从时间序列数据集到时间序列属性的标准映射相比，此管道涉及更复杂的设置，因为我们将对非时间序列数据进行计算以生成时间序列数据。有关通用管道指导，请查阅我们的 Pipeline Builder 文档。

我们使用的航班数据集包含以下列，我们将利用这些列创建时间序列属性：

flight_id: string | 用于标识航班及数据集中每一行的唯一字符串。
date: date | 航班起飞日期。
destination_airport_id: string | 用于标识目的地机场的字符串。
airline_id: string | 航空承运商的 ID。
origin_airport_id: string | 出发地机场的 ID。
dep_delay: integer | 起飞延误的分钟数。
arr_delay: integer | 到达延误的分钟数。
route_id: string| 用于标识航线的唯一字符串。

本指南末尾的管道将如下所示：

最终时间序列管道概览

第一部分：生成时间序列数据¶

使用与支持 Flights 对象类型相同的航班数据集，我们可以执行一些聚合转换，并根据航班指标生成时间序列数据。请注意，如果您已有来自历史记录系统或边缘传感器的时间序列数据进入 Foundry，则无需执行此步骤。您可以继续生成时间序列同步。

1. 对 `Carrier` 和 `Route` 对象类型应用转换¶

根据航班数据集，使用以下步骤应用转换。您需要对 Carrier 和 Route 对象类型分别执行此操作。

聚合数据¶

使用聚合转换按日期和对象 ID 分组（在本例中，对 Route 对象类型使用 route_id；您需要单独对 Carrier 对象类型使用 airline_id 执行相同操作），并计算平均到达延误、平均起飞延误以及每日航班数量。

Pipeline Builder 中的聚合转换面板。

聚合后，数据集预览应具有以下模式：

route_id	date	daily_avg_dep_delay	daily_avg_arr_delay	daily_count_of_flights
ATL -> SFO	2023-06-12	33.4545454545450000	40.0000000000000000	11
ATL -> FLL	2023-08-24	29.7272727272720000	19.4090909090909100	22
ATL -> TVC	2023-07-05	-8.0000000000000000	-8.0000000000000000	1

转换为新数据类型¶

要使用此新数据作为时间序列，我们必须创建一个时间戳类型列。为此，请使用转换转换将 date 列转换为时间戳类型列。我们还将很快应用逆透视转换，将 daily_avg_dep_delay、daily_avg_arr_delay 和 daily_count_of_flights 的值合并到一列中。由于此函数要求所有值具有相同的数据类型，我们还必须将每日航班计数指标转换为双精度类型（与平均延误指标相同的数据类型）。

Pipeline Builder 中的转换面板，显示转换为时间戳和转换为双精度

逆透视以合并时间序列值¶

由于此数据集在不同列中包含时间序列数据，我们必须使用逆透视转换将其合并到一个值列中，以便数据能够匹配时间序列同步所需的模式，如下所示：

系列 ID (series ID): string | 由 TSP 引用的一组时间戳和值对的系列 ID，必须与 TSP 的系列 ID 匹配。
时间戳 (timestamp): timestamp 或 long | 测量数量的时间点。
值 (value): integer、float、double、string | 在测量点测量的数量值。字符串类型表示分类时间序列；每个分类时间序列最多可以有 10,000 个唯一变体。

下面显示的逆透视转换将 daily_avg_dep_delay、daily_avg_arr_delay 和 daily_count_of_flights 的值放入同一个 series_value 列中。这些原始列名将输出到新的 series_name 列，该列将用于 series_id。

Pipeline Builder 中的逆透视转换面板

数据集模式现在应如下所示：

series_name	series_value	route_id	date
daily_avg_dep_delay	33.4545454545450000	ATL -> SFO	2023-06-12T00:00:00.000Z
daily_avg_arr_delay	40.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z
daily_count_of_flights	11.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z

连接字符串值以创建系列 ID¶

现在，我们可以使用连接字符串转换来创建系列 ID（关联时间序列值的标识符）。使用该转换将 series_name（每个传感器代表的内容）与每个对象的主键组合起来。

Pipeline Builder 中的连接字符串转换面板。

series_id	series_name	series_value	route_id	date
CMH -> IAH_daily_avg_dep_delay	daily_avg_dep_delay	33.4545454545450000	ATL -> SFO	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_avg_arr_delay	daily_avg_arr_delay	40.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_count_of_flights	daily_count_of_flights	11.0000000000000000	ATL -> SFO	2023-06-12T00:00:00.000Z

选择必要的列¶

使用选择列转换，我们将只保留时间序列同步所需的列：series_id、series_value 和 date。航班支持数据集将保存所有系列的时间序列值，无论它们测量的是什么。对 airline_carrier_id 列（来自航班数据集）重复此操作。

Pipeline Builder 中的选择列转换面板

series_id	series_value	date
CMH -> IAH_daily_avg_dep_delay	33.4545454545450000	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_avg_arr_delay	40.0000000000000000	2023-06-12T00:00:00.000Z
CMH -> IAH_daily_count_of_flights	11.0000000000000000	2023-06-12T00:00:00.000Z

2. 添加转换以聚合和生成出发地和目的地机场的数据¶

现在，您必须对出发地机场和目的地机场重复聚合和转换转换步骤。

聚合每日每条航线的航班数量¶

使用聚合转换按 date 和 origin_airport_id 分组，然后计算平均到达和起飞时间。每组中的总行数等于每日每条航线的航班数量。

Pipeline Builder 中的聚合转换面板，用于聚合每日航班数量

date	origin_airport_id	daily_avg_dep_delay	daily_count_of_departing_flights
2023-07-02	10299	9.34375000000000000	33
2023-09-06	10431	-2.3333333333333333	6
2023-01-12	10620	-7.0000000000000000	2

转换为时间戳¶

要使用此新数据作为时间序列，我们必须创建一个时间戳列。为此，请使用转换转换将 date 列转换为时间戳类型列。

Pipeline Builder 中的转换转换面板，用于将数据转换为时间戳类型

3. 创建连接以合并目的地和出发地机场¶

使用连接面板，创建一个左连接，合并来自目的地机场和出发地机场的数据，从而为机场数据生成完整的时间序列属性。确保为连接设置以下配置：

将日期和 origin_airport_id 与 dest_airport_id 匹配。
自动选择左侧数据集中的列。
作为右侧列，选择代表每日平均延误和每日航班数量的两列。

Pipeline Builder 中的连接面板，配置为创建机场数据的左连接。

date	origin_airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

4. 应用转换以格式化时间序列同步的数据¶

重命名列¶

现在我们已经将出发地机场数据与目的地机场数据连接起来，我们拥有了所有机场的到达和起飞指标。我们不再需要区分出发地和目的地，因此我们可以使用重命名列转换将 origin_airport_id 更改为简单的 airport_id。

Pipeline Builder 中的重命名列转换面板

重命名列后，数据预览应如下所示：

date	airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

转换为双精度¶

我们很快将应用逆透视转换。此函数要求所有值具有相同的数据类型，因此我们必须再次使用转换转换面板，将每日航班计数指标转换为双精度数据类型，使其与平均延误指标的类型相同。

添加航班数量¶

为了计算完整的每日航班数量，我们将使用添加数字转换，将每日到达航班数量和每日起飞航班数量相加，如下所示。

Pipeline Builder 中的转换和添加数字转换面板

daily_count_of_flights	date	airport_id	daily_avg_dep_delay	daily_count_of_departing_flights	daily_avg_arr_delay	daily_count_of_arriving_flights
77	2023-07-02	10299	9.34375000000000000	33	18.5294117647058840	34
12	2023-09-06	10431	-2.3333333333333333	6	-8.0000000000000000	6
4	2023-01-12	10620	-7.0000000000000000	2	56.5000000000000000	2

逆透视以合并系列值¶

由于此数据集在不同列中包含时间序列数据，我们必须使用逆透视转换将其合并到一个值列中，以便数据能够匹配时间序列同步所需的模式，如下所示：

系列 ID (series ID): string | 由 TSP 引用的一组时间戳和值对的系列 ID，必须与 TSP 的系列 ID 匹配。
时间戳 (timestamp): timestamp 或 long | 测量数量的时间点。
值 (value): integer、float、double、string | 在测量点测量的数量值。字符串类型表示分类时间序列；每个分类时间序列最多可以有 10,000 个唯一变体。

下面显示的逆透视转换将 daily_avg_dep_delay、daily_avg_arr_delay 和 daily_count_of_flights 的值放入同一个 series_value 列中。这些原始列名将输出到新的 series_name 列，该列将用于系列 ID。

Pipeline Builder 中的逆透视转换面板，配置为创建 series_name 列输出

数据预览应具有以下模式：

series_name	series_value	date	airport_id
daily_count_of_flights	77	2023-07-02T00:00:00.000Z	10299
daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z	10299
daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z	10299

连接字符串值以创建系列 ID¶

现在，我们可以使用连接字符串转换来创建系列 ID（关联时间序列值的标识符）。使用该转换将 series_name（每个传感器代表的内容）与 Airport 对象的主键（airport_id）组合起来。

Pipeline Builder 中的连接字符串面板，配置为将 series_name 与 airport_id 组合

series_id	series_name	series_value	date	airport_id
12099_daily_count_of_flights	daily_count_of_flights	77	2023-07-02T00:00:00.000Z	10299
12099_daily_avg_dep_delay	daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z	10299
12099_daily_avg_arr_delay	daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z	10299

选择必要的列¶

使用选择列转换，我们将只保留时间序列同步所需的列：series_id、series_value 和 date。航班支持数据集将保存所有系列的时间序列值，无论它们测量的是什么。

Pipeline Builder 中的选择列面板

生成的数据集应如下所示：

series_id	series_value	date
12099_daily_count_of_flights	77	2023-07-02T00:00:00.000Z
12099_daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z
12099_daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z

5. 将时间序列属性合并到支持数据集中¶

使用类型为 按名称合并 (Union by name) 的合并，使用代表 Carrier、Route 和 Airport 时间序列属性的转换。

从 Pipeline Builder 图中选择了三个时间序列属性节点进行合并

Pipeline Builder 中的合并面板，配置为按名称合并三个 TSP 属性集

series_id	series_value	date
12099_daily_count_of_flights	77	2023-07-02T00:00:00.000Z
12099_daily_avg_dep_delay	9.34375000000000000	2023-07-02T00:00:00.000Z
12099_daily_avg_arr_delay	18.5294117647058840	2023-07-02T00:00:00.000Z
CMH -> IAH_daily_avg_dep_delay	-8.0000000000000000	2023-03-21T00:00:00.000Z
20304_daily_avg_arr_delay	9.12500000000000000	2023-08-13T00:00:00.000Z

第二部分：创建时间序列同步¶

1. 移除空值¶

对生成的数据集应用过滤转换，以移除任何 null 值。

Pipeline Builder 中的过滤转换面板，配置为移除空值。

2. 配置时间序列同步¶

现在，通过从屏幕右侧的管道输出部分选择 添加 (Add)，然后选择 时间序列同步 (Time series sync)，创建一个时间序列同步。为新的时间序列同步填写必要的数据，并注意以下事项：

标题 "[示例] 时间序列同步 | 事件" 将对应于 Palantir 文件系统文件夹中的结果资源。
为 系列 ID (Series ID) 字段选择 series_id 列。
在 时间 (Time) 字段中添加创建的 date 时间戳列。
将 series_value 添加到 值 (Value) 字段。

现在，保存并构建管道。输出将在与管道相同的文件夹中创建。

3. 使用时间序列同步向对象类型添加属性¶

现在您已经创建了包含时间序列同步的管道，可以准备使用该同步向 Route、Carrier 和 Airport 对象类型添加时间序列属性了。请继续查阅我们关于向对象类型添加时间序列属性的文档以获取更多指导。

Create time series data with Pipeline Builder（使用 Pipeline Builder 创建时间序列数据）¶

Part I: Generate time series data¶

1. Apply transforms to Carrier and Route object types¶

Aggregate the data¶

Cast to a new data type¶

Unpivot to merge time series values¶

Concatenate string values to create the series ID¶

Select necessary columns¶

2. Add a transform to aggregate and generate data for origin and destination airports¶

Aggregate for the number of flights per day per route¶

Cast to timestamp¶

3. Create a join to combine destination and origin airports¶

4. Apply transforms to format data for a time series sync¶

Rename column¶

Cast to double¶

Add flight numbers¶

Unpivot to merge series values¶

Concatenate string values to create the series ID¶

Select necessary columns¶

5. Union the time series properties into a backing dataset¶

Part II: Create the time series sync¶

1. Remove null values¶

2. Configure the time series sync¶

3. Use a time series sync to add properties to object types¶

中文翻译¶

使用 Pipeline Builder 创建时间序列数据¶

第一部分：生成时间序列数据¶

1. 对 Carrier 和 Route 对象类型应用转换¶

聚合数据¶

转换为新数据类型¶

逆透视以合并时间序列值¶

连接字符串值以创建系列 ID¶

选择必要的列¶

2. 添加转换以聚合和生成出发地和目的地机场的数据¶

聚合每日每条航线的航班数量¶

转换为时间戳¶

3. 创建连接以合并目的地和出发地机场¶

4. 应用转换以格式化时间序列同步的数据¶

重命名列¶

转换为双精度¶

添加航班数量¶

逆透视以合并系列值¶

连接字符串值以创建系列 ID¶

选择必要的列¶

5. 将时间序列属性合并到支持数据集中¶

第二部分：创建时间序列同步¶

1. 移除空值¶

2. 配置时间序列同步¶

3. 使用时间序列同步向对象类型添加属性¶

1. Apply transforms to `Carrier` and `Route` object types¶

1. 对 `Carrier` 和 `Route` 对象类型应用转换¶