跳转至

foundryts.functions.interpolate

foundryts.functions.interpolate(before=None, internal=None, after=None, frequency=None, rename_columns_by=None, static_column_name=None)

Returns a function that joins one or more time series into a single time series with a column per input series.

↗ Interpolation estimates the value of missing points in a series for timestamps where the point exists in other series. The function uses the configured interpolation strategies to resample and align the input series data where it misses points.

Interpolation of time series can be divided into three distinct time-ranges: before, internal, and after.

Each time-range handles interpolating missing values in different parts of the time series:

  • before: Interpolates all points before the first point of the series being interpolated. For example see how : in the external_interpolated_series example below, the interpolation creates a new point for series 2 before its first point, filling it with the nearest value.
  • internal: Interpolates values between existing data points within the time series being interpolated. For : example see how in the linear_interpolated_series example below, the interpolation estimates the value for points in the time_extent() of both series.
  • after: Interpolates all points after the last data point of the series being interpolated. For example see how : in the external_interpolated_series example below, the interpolation creates a new point for series 1 after its last point, filling it with the nearest value.

For applying varying strategies to each series, each strategy for the above time-ranges can be passed as a list. Each list element corresponds to the strategy used for the respective input series. The same strategy is applied to all input series if a single strategy is passed.

Interpolation strategies supported for internal interpolation:

Strategy Description
LINEAR Linearly interpolates points using the best fit line for the 2 points
immediately before and after the timestamp being interpolated.
NEAREST Use the value of the nearest point by timestamp in one of the input series.
PREVIOUS Use the value of the previous defined point in the input time series.
NEXT Use the value of the next occuring point in the input time series.
NONE Skip interpolation. For timestamps where points don’t exist for any of the
input series, null values will be used in the output df.

Interpolation strategies supported for external interpolation (before, after):

Strategy Description
NEAREST Take the value of the nearest defined point (this will be either the first or last point).
NONE
(default)
Never interpolate before the first point and beyond the last point.

An optional frequency can be configured for only interpolating timestamps at the specified frequency. Providing a frequency completely resamples the input series, only creating and interpolating points at the specified frequency. See the interpolated_every_10ns_series and multiple_interpolated_every_10ns_series examples below for an idea of the resampled output.

  • Parameters:
  • before (Union [str , List [str ] ] , optional) – Strategy for interpolating points before the first in the series, which can be a list per series, use a valid strategy from the external interpolation table above (default is NONE).
  • internal (Union [str , List [str ] ] , optional) – Strategy for interpolating points between existing points, which can be a list per series, use a valid value from the strategy from the internal interpolation table above (default is NONE). (default is NONE).
  • after (Union [str , List [str ] ] , optional) – Strategy for interpolating points after the first in the series, which can be a list per series, use a valid strategy from the external interpolation table above (default is NONE).
  • frequency (Union [str , pandas.Timedelta ] , optional) – Output frequency for interpolated points, value will be processed as respective Hertz e.g. ‘5ms’, ‘1s’ (default is no fixed frequency and only timestamps present in other series will be interpolated)
  • rename_columns_by (str | Callable [ *[*N.FunctionNode ] , str ] , optional) – Either metadata key to identify series columns in the result or a callable that returns the name for each series (default is series identifiers).
  • static_column_name (str , optional) – Static name for the value column, which overrides rename_columns_by.
  • Returns: A function that returns the interpolated series using the configured strategies.
  • Return type: (Union[FunctionNode, NodeCollections]) -> Union[FunctionNode, NodeCollection]

Dataframe schema

Column name Type Description
timestamp pandas.Timestamp Timestamp of the point
value Union[float, str] Value of the point

:::callout{theme="warning" title="Note"} Do not use LINEAR interpolation for enum series, or this operation will fail.

The output of this function can be a single-series or a multi-series dataframe which will only work with other functions expecting the respective input. :::

:::callout{theme="success" title="See Also"} scatter() :::

Examples

>>> series_1 = F.points((1, 1.0), (101, 2.0), (200, 4.0), (201, 8.0), name="series-1")
>>> series_2 = F.points((2, 11.0), (102, 12.0), (201, 14.0), (202, 18.0), name="series-2")
>>> series_1.to_pandas()
                    timestamp  value
0 1970-01-01 00:00:00.000000001    1.0
1 1970-01-01 00:00:00.000000101    2.0
2 1970-01-01 00:00:00.000000200    4.0
3 1970-01-01 00:00:00.000000201    8.0
>>> series_2.to_pandas()
                    timestamp  value
0 1970-01-01 00:00:00.000000002   11.0
1 1970-01-01 00:00:00.000000102   12.0
2 1970-01-01 00:00:00.000000201   14.0
3 1970-01-01 00:00:00.000000202   18.0
>>> nc = NodeCollection([series_1, series_2])
>>> linear_interpolated_series = F.interpolate(internal="LINEAR")(nc)
>>> linear_interpolated_series.to_pandas()
                    timestamp  series-1   series-2
0 1970-01-01 00:00:00.000000001  1.000000        NaN
1 1970-01-01 00:00:00.000000002  1.010000  11.000000
2 1970-01-01 00:00:00.000000101  2.000000  11.990000
3 1970-01-01 00:00:00.000000102  2.020202  12.000000
4 1970-01-01 00:00:00.000000200  4.000000  13.979798
5 1970-01-01 00:00:00.000000201  8.000000  14.000000
6 1970-01-01 00:00:00.000000202       NaN  18.000000
>>> nearest_interpolated_series = F.interpolate(internal="NEAREST")(nc)
>>> nearest_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       1.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      12.0
3 1970-01-01 00:00:00.000000102       2.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      14.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> previous_interpolated_series = F.interpolate(internal="PREVIOUS")(nc)
>>> previous_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       1.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      11.0
3 1970-01-01 00:00:00.000000102       2.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      12.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> next_interpolated_series = F.interpolate(internal="NEXT")(nc)
>>> next_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       2.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      12.0
3 1970-01-01 00:00:00.000000102       4.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      14.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> none_interpolated_series = F.interpolate(internal="NONE")(nc) # skip any missing points
>>> none_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       NaN      11.0
2 1970-01-01 00:00:00.000000101       2.0       NaN
3 1970-01-01 00:00:00.000000102       NaN      12.0
4 1970-01-01 00:00:00.000000200       4.0       NaN
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> external_interpolated_series = F.interpolate(before="NEAREST", after="NEAREST")(nc)
>>> external_interpolated_series.to_dataframe()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0      11.0
1 1970-01-01 00:00:00.000000002       NaN      11.0
2 1970-01-01 00:00:00.000000101       2.0       NaN
3 1970-01-01 00:00:00.000000102            12.0
4 1970-01-01 00:00:00.000000200       4.0       NaN
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       8.0      18.0
>>> interpolated_series = F.interpolate(internal=["LINEAR", "NONE"])(nc) # different strategies for each series
>>> interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001  1.000000       NaN
1 1970-01-01 00:00:00.000000002  1.010000      11.0
2 1970-01-01 00:00:00.000000101  2.000000       NaN
3 1970-01-01 00:00:00.000000102  2.020202      12.0
4 1970-01-01 00:00:00.000000200  4.000000       NaN
5 1970-01-01 00:00:00.000000201  8.000000      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> interpolated_every_10ns_series = F.interpolate(internal="NEAREST", frequency="10ns")(series_1)
>>> interpolated_every_10ns_series.to_pandas()
                        timestamp  series-1
0  1970-01-01 00:00:00.000000010       1.0
1  1970-01-01 00:00:00.000000020       1.0
2  1970-01-01 00:00:00.000000030       1.0
3  1970-01-01 00:00:00.000000040       1.0
4  1970-01-01 00:00:00.000000050       1.0
5  1970-01-01 00:00:00.000000060       2.0
6  1970-01-01 00:00:00.000000070       2.0
7  1970-01-01 00:00:00.000000080       2.0
8  1970-01-01 00:00:00.000000090       2.0
9  1970-01-01 00:00:00.000000100       2.0
10 1970-01-01 00:00:00.000000110       2.0
11 1970-01-01 00:00:00.000000120       2.0
12 1970-01-01 00:00:00.000000130       2.0
13 1970-01-01 00:00:00.000000140       2.0
14 1970-01-01 00:00:00.000000150       2.0
15 1970-01-01 00:00:00.000000160       4.0
16 1970-01-01 00:00:00.000000170       4.0
17 1970-01-01 00:00:00.000000180       4.0
18 1970-01-01 00:00:00.000000190       4.0
19 1970-01-01 00:00:00.000000200       4.0
>>> multiple_interpolated_every_10ns_series = F.interpolate(
...     internal="NEAREST", frequency="10ns"
... )(nc)
>>> multiple_interpolated_every_10ns_series.to_pandas()
                       timestamp  series-1  series-2
0  1970-01-01 00:00:00.000000010       1.0      11.0
1  1970-01-01 00:00:00.000000020       1.0      11.0
2  1970-01-01 00:00:00.000000030       1.0      11.0
3  1970-01-01 00:00:00.000000040       1.0      11.0
4  1970-01-01 00:00:00.000000050       1.0      11.0
5  1970-01-01 00:00:00.000000060       2.0      12.0
6  1970-01-01 00:00:00.000000070       2.0      12.0
7  1970-01-01 00:00:00.000000080       2.0      12.0
8  1970-01-01 00:00:00.000000090       2.0      12.0
9  1970-01-01 00:00:00.000000100       2.0      12.0
10 1970-01-01 00:00:00.000000110       2.0      12.0
11 1970-01-01 00:00:00.000000120       2.0      12.0
12 1970-01-01 00:00:00.000000130       2.0      12.0
13 1970-01-01 00:00:00.000000140       2.0      12.0
14 1970-01-01 00:00:00.000000150       2.0      12.0
15 1970-01-01 00:00:00.000000160       4.0      14.0
16 1970-01-01 00:00:00.000000170       4.0      14.0
17 1970-01-01 00:00:00.000000180       4.0      14.0
18 1970-01-01 00:00:00.000000190       4.0      14.0
19 1970-01-01 00:00:00.000000200       4.0      14.0

中文翻译

foundryts.functions.interpolate

foundryts.functions.interpolate(before=None, internal=None, after=None, frequency=None, rename_columns_by=None, static_column_name=None)

返回一个函数,用于将一个或多个时间序列合并为单个时间序列,每个输入序列对应一列。

↗ 插值(Interpolation) 用于估计序列中缺失点的值,这些缺失点对应其他序列中存在数据的时间戳。该函数使用配置的插值策略对缺失点的输入序列数据进行重采样和对齐。

时间序列的插值可分为三个不同的时间范围:之前(before)、内部(internal)和之后(after)。

每个时间范围处理时间序列不同部分的缺失值插值:

  • before:在待插值序列的第一个点之前的所有点进行插值。例如,在下面的 external_interpolated_series 示例中,插值在序列2的第一个点之前创建了一个新点,并使用最近值填充。
  • internal:在待插值时间序列的现有数据点之间进行插值。例如,在下面的 linear_interpolated_series 示例中,插值估计了两个序列的 time_extent() 范围内点的值。
  • after:在待插值序列的最后一个数据点之后的所有点进行插值。例如,在下面的 external_interpolated_series 示例中,插值在序列1的最后一个点之后创建了一个新点,并使用最近值填充。

如需对每个序列应用不同的策略,上述时间范围的每个策略都可以作为列表传递。列表中的每个元素对应相应输入序列使用的策略。如果传递单个策略,则对所有输入序列应用相同的策略。

内部插值支持的策略:

策略(Strategy) 描述(Description)
LINEAR 使用待插值时间戳前后两个点的最佳拟合线进行线性插值。
NEAREST 使用输入序列中按时间戳最近的点的值。
PREVIOUS 使用输入时间序列中前一个已定义点的值。
NEXT 使用输入时间序列中下一个出现的点的值。
NONE 跳过插值。对于任何输入序列都不存在点的时间戳,输出数据框中将使用空值。

外部插值(before、after)支持的策略:

策略(Strategy) 描述(Description)
NEAREST 取最近已定义点的值(这将是第一个或最后一个点)。
NONE(默认) 不在第一个点之前和最后一个点之后进行插值。

可以配置可选的频率(frequency),仅对指定频率的时间戳进行插值。提供频率会完全重采样输入序列,仅创建和插值指定频率的点。请参考下面的 interpolated_every_10ns_series 和 multiple_interpolated_every_10ns_series 示例了解重采样后的输出。

  • 参数:
  • before (Union [str , List [str ] ] , 可选) – 在序列第一个点之前进行插值的策略,可以是每个序列对应的列表,使用上方外部插值表中的有效策略(默认为 NONE)。
  • internal (Union [str , List [str ] ] , 可选) – 在现有数据点之间进行插值的策略,可以是每个序列对应的列表,使用上方内部插值表中的有效策略(默认为 NONE)。
  • after (Union [str , List [str ] ] , 可选) – 在序列第一个点之后进行插值的策略,可以是每个序列对应的列表,使用上方外部插值表中的有效策略(默认为 NONE)。
  • frequency (Union [str , pandas.Timedelta ] , 可选) – 插值点的输出频率,值将按相应的赫兹处理,例如 '5ms'、'1s'(默认为无固定频率,仅对其他序列中存在的时间戳进行插值)。
  • rename_columns_by (str | Callable [ *[*N.FunctionNode ] , str ] , 可选) – 用于标识结果中序列列的元数据键,或返回每个序列名称的可调用对象(默认为序列标识符)。
  • static_column_name (str , 可选) – 值列的静态名称,将覆盖 rename_columns_by。
  • 返回: 一个函数,返回使用配置策略进行插值后的序列。
  • 返回类型: (Union[FunctionNode, NodeCollections]) -> Union[FunctionNode, NodeCollection]

数据框模式(Dataframe schema)

列名(Column name) 类型(Type) 描述(Description)
timestamp pandas.Timestamp 点的时间戳
value Union[float, str] 点的值

:::callout{theme="warning" title="注意"} 不要对枚举序列使用 LINEAR 插值,否则此操作将失败。

此函数的输出可以是单序列或多序列数据框,仅适用于期望相应输入的其他函数。 :::

:::callout{theme="success" title="另请参阅"} scatter() :::

示例(Examples)

>>> series_1 = F.points((1, 1.0), (101, 2.0), (200, 4.0), (201, 8.0), name="series-1")
>>> series_2 = F.points((2, 11.0), (102, 12.0), (201, 14.0), (202, 18.0), name="series-2")
>>> series_1.to_pandas()
                    timestamp  value
0 1970-01-01 00:00:00.000000001    1.0
1 1970-01-01 00:00:00.000000101    2.0
2 1970-01-01 00:00:00.000000200    4.0
3 1970-01-01 00:00:00.000000201    8.0
>>> series_2.to_pandas()
                    timestamp  value
0 1970-01-01 00:00:00.000000002   11.0
1 1970-01-01 00:00:00.000000102   12.0
2 1970-01-01 00:00:00.000000201   14.0
3 1970-01-01 00:00:00.000000202   18.0
>>> nc = NodeCollection([series_1, series_2])
>>> linear_interpolated_series = F.interpolate(internal="LINEAR")(nc)
>>> linear_interpolated_series.to_pandas()
                    timestamp  series-1   series-2
0 1970-01-01 00:00:00.000000001  1.000000        NaN
1 1970-01-01 00:00:00.000000002  1.010000  11.000000
2 1970-01-01 00:00:00.000000101  2.000000  11.990000
3 1970-01-01 00:00:00.000000102  2.020202  12.000000
4 1970-01-01 00:00:00.000000200  4.000000  13.979798
5 1970-01-01 00:00:00.000000201  8.000000  14.000000
6 1970-01-01 00:00:00.000000202       NaN  18.000000
>>> nearest_interpolated_series = F.interpolate(internal="NEAREST")(nc)
>>> nearest_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       1.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      12.0
3 1970-01-01 00:00:00.000000102       2.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      14.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> previous_interpolated_series = F.interpolate(internal="PREVIOUS")(nc)
>>> previous_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       1.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      11.0
3 1970-01-01 00:00:00.000000102       2.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      12.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> next_interpolated_series = F.interpolate(internal="NEXT")(nc)
>>> next_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       2.0      11.0
2 1970-01-01 00:00:00.000000101       2.0      12.0
3 1970-01-01 00:00:00.000000102       4.0      12.0
4 1970-01-01 00:00:00.000000200       4.0      14.0
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> none_interpolated_series = F.interpolate(internal="NONE")(nc) # 跳过所有缺失点
>>> none_interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0       NaN
1 1970-01-01 00:00:00.000000002       NaN      11.0
2 1970-01-01 00:00:00.000000101       2.0       NaN
3 1970-01-01 00:00:00.000000102       NaN      12.0
4 1970-01-01 00:00:00.000000200       4.0       NaN
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> external_interpolated_series = F.interpolate(before="NEAREST", after="NEAREST")(nc)
>>> external_interpolated_series.to_dataframe()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001       1.0      11.0
1 1970-01-01 00:00:00.000000002       NaN      11.0
2 1970-01-01 00:00:00.000000101       2.0       NaN
3 1970-01-01 00:00:00.000000102            12.0
4 1970-01-01 00:00:00.000000200       4.0       NaN
5 1970-01-01 00:00:00.000000201       8.0      14.0
6 1970-01-01 00:00:00.000000202       8.0      18.0
>>> interpolated_series = F.interpolate(internal=["LINEAR", "NONE"])(nc) # 每个序列使用不同策略
>>> interpolated_series.to_pandas()
                    timestamp  series-1  series-2
0 1970-01-01 00:00:00.000000001  1.000000       NaN
1 1970-01-01 00:00:00.000000002  1.010000      11.0
2 1970-01-01 00:00:00.000000101  2.000000       NaN
3 1970-01-01 00:00:00.000000102  2.020202      12.0
4 1970-01-01 00:00:00.000000200  4.000000       NaN
5 1970-01-01 00:00:00.000000201  8.000000      14.0
6 1970-01-01 00:00:00.000000202       NaN      18.0
>>> interpolated_every_10ns_series = F.interpolate(internal="NEAREST", frequency="10ns")(series_1)
>>> interpolated_every_10ns_series.to_pandas()
                        timestamp  series-1
0  1970-01-01 00:00:00.000000010       1.0
1  1970-01-01 00:00:00.000000020       1.0
2  1970-01-01 00:00:00.000000030       1.0
3  1970-01-01 00:00:00.000000040       1.0
4  1970-01-01 00:00:00.000000050       1.0
5  1970-01-01 00:00:00.000000060       2.0
6  1970-01-01 00:00:00.000000070       2.0
7  1970-01-01 00:00:00.000000080       2.0
8  1970-01-01 00:00:00.000000090       2.0
9  1970-01-01 00:00:00.000000100       2.0
10 1970-01-01 00:00:00.000000110       2.0
11 1970-01-01 00:00:00.000000120       2.0
12 1970-01-01 00:00:00.000000130       2.0
13 1970-01-01 00:00:00.000000140       2.0
14 1970-01-01 00:00:00.000000150       2.0
15 1970-01-01 00:00:00.000000160       4.0
16 1970-01-01 00:00:00.000000170       4.0
17 1970-01-01 00:00:00.000000180       4.0
18 1970-01-01 00:00:00.000000190       4.0
19 1970-01-01 00:00:00.000000200       4.0
>>> multiple_interpolated_every_10ns_series = F.interpolate(
...     internal="NEAREST", frequency="10ns"
... )(nc)
>>> multiple_interpolated_every_10ns_series.to_pandas()
                       timestamp  series-1  series-2
0  1970-01-01 00:00:00.000000010       1.0      11.0
1  1970-01-01 00:00:00.000000020       1.0      11.0
2  1970-01-01 00:00:00.000000030       1.0      11.0
3  1970-01-01 00:00:00.000000040       1.0      11.0
4  1970-01-01 00:00:00.000000050       1.0      11.0
5  1970-01-01 00:00:00.000000060       2.0      12.0
6  1970-01-01 00:00:00.000000070       2.0      12.0
7  1970-01-01 00:00:00.000000080       2.0      12.0
8  1970-01-01 00:00:00.000000090       2.0      12.0
9  1970-01-01 00:00:00.000000100       2.0      12.0
10 1970-01-01 00:00:00.000000110       2.0      12.0
11 1970-01-01 00:00:00.000000120       2.0      12.0
12 1970-01-01 00:00:00.000000130       2.0      12.0
13 1970-01-01 00:00:00.000000140       2.0      12.0
14 1970-01-01 00:00:00.000000150       2.0      12.0
15 1970-01-01 00:00:00.000000160       4.0      14.0
16 1970-01-01 00:00:00.000000170       4.0      14.0
17 1970-01-01 00:00:00.000000180       4.0      14.0
18 1970-01-01 00:00:00.000000190       4.0      14.0
19 1970-01-01 00:00:00.000000200       4.0      14.0