Interpolation in time series(时间序列中的插值)¶
This page explains how interpolation is used for time series in the Palantir platform and how to customize these settings for your use cases.
What is interpolation?¶
Interpolation is a technique commonly used in time series analysis to estimate missing values between known data points. By using the existing data points, interpolation allows for a continuous and complete representation of the time series, even in the absence of recorded measurements at certain times.
Types of interpolation¶
The internal interpolation options available in the Palantir platform are:
LINEAR: Infers a value by drawing a straight line between the previous and next known data points. Only applicable to numerical time series.NEAREST: Takes the value of the nearest point.PREVIOUS: Takes the value of the previous point.NEXT: Takes the value of the next point.NONE: Does not infer any value if there is no point at that timestamp.
For external interpolation (before the first point or after the last point), the valid settings are:
NEAREST: Takes the value of the nearest point.NONE: Does not infer any value if there is no point at that timestamp.
A series has a defined interpolation if its internal interpolation is something other than NONE.
Specifying interpolation¶
For a time series property in the ontology, you can specify a desired internal interpolation for your series using time series formatting. By default, numeric time series use LINEAR interpolation and categorical series use PREVIOUS.
The role of interpolation in time series calculations¶
When combining multiple series, the points that are used to build the output series are dependent on the respective interpolations of the input series. If all the series have a defined interpolation, then the resulting series will be created out of the union of all the points from all series. This means that even if only one series had a point at a certain timestamp, interpolation will be used to determine what the value of all the other series are at that timestamp before actually performing the relevant join.
However, if one or more of the series do not have defined interpolations, then the operation will only iterate over the set of points created by the intersection of all such input series.
For example, assume you have three time series A, B, and C, in which none of A, B, or C have interpolation defined.
| Timestamps | Series A | Series B | Series C | Expected A + B + C |
|---|---|---|---|---|
| 2025/02/02 00:00:00 | 1 | 2 | 0 | 3 |
| 2025/02/03 00:00:00 | 2 | 1 | ||
| 2025/02/04 00:00:00 | 3 | 2 | ||
| 2025/02/05 00:00:00 | 4 | 2 | 3 | 9 |
| 2025/02/06 00:00:00 | 5 | 2 |
If none of the series have interpolation defined, then the result of joining them will only be computed over the timestamps where all series have defined points. In this case, joining (by adding the series together) would only occur at the first and fourth point:

If all the series have interpolation defined, the resulting series would look as follows:

However, since the last point of series C is before the last point of either series A or B, and series C does not have any external interpolation (before the first point or after the last point) set, no value can be computed for the last timestamp. Adding NEAREST external interpolation to series C results in the following:

Finally, assume a mixed scenario in which series A has interpolation defined, but series B and C do not. The set of time stamps in the output series is determined by taking the intersection of the series which have no interpolation defined, giving the following result:

中文翻译¶
时间序列中的插值¶
本文档介绍如何在 Palantir 平台中对时间序列使用插值,以及如何根据您的用例自定义这些设置。
什么是插值?¶
插值是时间序列分析中常用的一种技术,用于估算已知数据点之间的缺失值。通过利用现有数据点,插值可以在某些时间点没有记录测量值的情况下,实现时间序列的连续和完整表示。
插值类型¶
Palantir 平台中可用的内部插值选项包括:
LINEAR(线性插值):通过在前一个和下一个已知数据点之间绘制一条直线来推断数值。仅适用于数值型时间序列。NEAREST(最近邻插值):取最近点的值。PREVIOUS(前向插值):取前一个点的值。NEXT(后向插值):取下一个点的值。NONE(无插值):如果该时间戳没有数据点,则不推断任何值。
对于外部插值(第一个点之前或最后一个点之后),有效的设置包括:
NEAREST(最近邻插值):取最近点的值。NONE(无插值):如果该时间戳没有数据点,则不推断任何值。
如果一个序列的内部插值不是 NONE,则该序列具有已定义的插值。
指定插值¶
对于本体论(Ontology)中的时间序列属性,您可以使用时间序列格式化为序列指定所需的内部插值。默认情况下,数值型时间序列使用 LINEAR 插值,分类型序列使用 PREVIOUS 插值。
插值在时间序列计算中的作用¶
当组合多个序列时,用于构建输出序列的数据点取决于输入序列各自的插值。如果所有序列都具有已定义的插值,则结果序列将由所有序列中所有数据点的并集创建。这意味着,即使只有一个序列在某个时间戳有数据点,在实际执行相关连接之前,也会使用插值来确定所有其他序列在该时间戳的值。
然而,如果一个或多个序列没有已定义的插值,则操作将仅遍历由所有此类输入序列的交集创建的数据点集。
例如,假设您有三个时间序列 A、B 和 C,其中 A、B 和 C 都没有定义插值。
| 时间戳 | 序列 A | 序列 B | 序列 C | 预期 A + B + C |
|---|---|---|---|---|
| 2025/02/02 00:00:00 | 1 | 2 | 0 | 3 |
| 2025/02/03 00:00:00 | 2 | 1 | ||
| 2025/02/04 00:00:00 | 3 | 2 | ||
| 2025/02/05 00:00:00 | 4 | 2 | 3 | 9 |
| 2025/02/06 00:00:00 | 5 | 2 |
如果所有序列都没有定义插值,则连接它们的结果将仅在所有序列都有定义数据点的时间戳上计算。在这种情况下,连接(通过将序列相加)将仅发生在第一个和第四个数据点上:

如果所有序列都定义了插值,结果序列将如下所示:

然而,由于序列 C 的最后一个数据点早于序列 A 或 B 的最后一个数据点,并且序列 C 没有设置任何外部插值(第一个点之前或最后一个点之后),因此无法计算最后一个时间戳的值。为序列 C 添加 NEAREST 外部插值后,结果如下:

最后,假设一个混合场景:序列 A 定义了插值,但序列 B 和 C 没有。输出序列中的时间戳集通过取没有定义插值的序列的交集来确定,得到以下结果:
