foundryts.functions.time_series_search¶
foundryts.functions.time_series_search(predicate, labels=None, start=None, end=None, interval_values=None, before='nearest', internal='default', after='nearest', min_duration=None, max_duration=None)¶
Returns a function that will search intervals on a time series using the provided predicate.
The function will return time intervals where the predicate is true.
Each returned interval is evaluated to the associated statistics(). The dsl() formula in
interval_values will be used for evaluating the final statistics().
The specified interpolation strategies are used for filling in missing timestamps. See interpolate()
for more details on interpolation and strategies.
The intervals produced by this function are equivalent to events in Quiver. This is particularly
useful when a time series demonstrates intervaled behavior and analysis on the time series requires
access the intervals. Each time series can then be split into ranges using time_range() such that
each interval is a new time_range() and operations can be applied independently on each time-range.
- Parameters:
- predicate (str) – The predicate to search for intervals using a
dsl()conditional program. - labels (Union [str , List [str ] ] , optional) – Aliases for each input time series to refer to them in
predicateandinterval_values(default is [‘a’, ‘b’, …, ‘aa’, ‘ab’, …]). - start (int | datetime | str , optional) – Timestamp (inclusive) to start evaluating intervals in the time series. For an interval overlapping with the
starttimestamp, the full interval will be included in the output (default is pandas.Timestamp.min). - end (int | datetime | str , optional) – Timestamp (exclusive) to end evaluating intervals in the time series. For an interval overlapping with the
endtimestamp, the full interval will be included in the output (default is pandas.Timestamp.max`). - interval_values (str , optional) –
dsl()program to transform the values that the interval statistics are computed over. This is required for a non-numeric input time series since statistics cannot be computed over non-numeric data. (default is the first input time series). - before (Union [str , List [str ] ] , optional) – Strategy for interpolating points before the first point in the series, which can be a list per series,
use a valid strategy from
interpolate()(default isNEAREST). - internal (Union [str , List [str ] ] , optional) – Strategy for interpolating points between existing points, which can be a list per series, use a
valid value from
interpolate()(default isLINEARfor numeric andPREVIOUSfor enum time series). - after (Union [str , List [str ] ] , optional) – Strategy for interpolating points after the last point in the series, which can be a list per series,
use a valid strategy from
interpolate()(default isNEAREST). - min_duration (int | str | datetime.timedelta , optional) – Minimum duration for which predicate must be true for the time-range to qualify as an interval.
- max_duration (int | str | datetime.timedelta , optional) – Maximum duration for which predicate must be true for the time-range to qualify as an interval.
- Returns: A function that returns the statistics over intervals satisfying the predicate for input time series.
- Return type: (Union[FunctionNode, NodeCollections]) -> SummaryNode
Dataframe schema¶
| Column name | Type | Description |
|---|---|---|
| count | int | Number of data points in the interval. |
| earliest_point.timestamp | datetime | Timestamp of the first data point in the interval. |
| earliest_point.value | float | Value of the first data point in the interval. |
| end_timestamp | datetime | Timestamp (exclusive) of the end of the interval. |
| largest_point.timestamp | datetime | Timestamp of the data point with the largest value in the interval. |
| largest_point.value | float | Largest value in the interval. |
| latest_point.timestamp | datetime | Timestamp of the most recent data point in the interval. |
| latest_point.value | float | Value of the most recent data point in the interval. |
| mean | float | Average value of all data points in the interval. |
| smallest_point.timestamp | datetime | Timestamp of the data point with the smallest value in the interval. |
| smallest_point.value | float | Smallest value in the interval. |
| start_timestamp | datetime | Timestamp of the first data point in the interval. |
| standard_deviation | float | Standard deviation of the data points in the interval. |
| duration.seconds | int | Duration of the interval in seconds. |
| duration.subsecond_nanos | int | Duration of the interval in nanoseconds. |
:::callout{theme="success" title="See Also"}
interpolate(), statistics()
:::
Examples¶
>>> discrete_series = F.points(
... (0, 1.0),
... (1, 2.0),
... (2, 2.0),
... (3, 3.0),
... (4, 5.0),
... (5, 6.0),
... (6, 4.0),
... (7, 2.0),
... (8, 6.0),
... (9, 7.0),
... (10, 8.0),
... (11, 10.0),
... (12, 11.0),
... name="discrete",
... )
>>> discrete_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000000 1.0
1 1970-01-01 00:00:00.000000001 2.0
2 1970-01-01 00:00:00.000000002 2.0
3 1970-01-01 00:00:00.000000003 3.0
4 1970-01-01 00:00:00.000000004 5.0
5 1970-01-01 00:00:00.000000005 6.0
6 1970-01-01 00:00:00.000000006 4.0
7 1970-01-01 00:00:00.000000007 2.0
8 1970-01-01 00:00:00.000000008 6.0
9 1970-01-01 00:00:00.000000009 7.0
10 1970-01-01 00:00:00.000000010 8.0
11 1970-01-01 00:00:00.000000011 10.0
12 1970-01-01 00:00:00.000000012 11.0
>>> even_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... )(discrete_series)
# 3 Intervals with points:
# Interval 1: [(1, 2.0), (2, 2.0)]
# Interval 2: [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
# Interval 3: [(10, 8.0), (11, 10.0)]
>>> even_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.000000 1970-01-01 00:00:00.000000001
1 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
2 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.000000 1970-01-01 00:00:00.000000010
>>> search_formula = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete * 2",
... labels="discrete",
... )(discrete_series)
# 3 Intervals with points (with doubled values):
# Interval 1: [(1, 4.0), (2, 4.0)]
# Interval 2: [(5, 12.0), (6, 8.0), (7, 4.0), (8, 12.0)]
# Interval 3: [(10, 16.0), (11, 20.0)]
>>> search_formula.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 4.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 4.0 1970-01-01 00:00:00.000000002 4.0 4.0 1970-01-01 00:00:00.000000002 4.0 0.000000 1970-01-01 00:00:00.000000001
1 4 0 4 1970-01-01 00:00:00.000000005 12.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 12.0 1970-01-01 00:00:00.000000008 12.0 9.0 1970-01-01 00:00:00.000000007 4.0 3.316625 1970-01-01 00:00:00.000000005
2 2 0 2 1970-01-01 00:00:00.000000010 16.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 20.0 1970-01-01 00:00:00.000000011 20.0 18.0 1970-01-01 00:00:00.000000010 16.0 2.000000 1970-01-01 00:00:00.000000010
>>> min_duration_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... min_duration="3ns",
... )(discrete_series)
# The first and last intervals are filtered due to duration < 3
# 1 Interval with points:
# [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
>>> min_duration_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
>>> max_duration_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... max_duration="3ns",
... )(discrete_series)
# Second interval is filtered due to duration > 3
# 2 Intervals with points:
# Interval 1: [(1, 2.0), (2, 2.0)]
# Interval 2: [(10, 8.0), (11, 10.0)]
>>> max_duration_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.0 1970-01-01 00:00:00.000000001
1 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.0 1970-01-01 00:00:00.000000010
>>> toggle_series = F.points(
... (0, "OFF"),
... (1, "ON"),
... (2, "OFF"),
... (3, "OFF"),
... (4, "ON"),
... (5, "ON"),
... (6, "ON"),
... (7, "OFF"),
... (8, "ON"),
... (9, "ON"),
... (10, "OFF"),
... (11, "OFF"),
... (12, "ON"),
... name="toggle",
... )
>>> toggle_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000000 OFF
1 1970-01-01 00:00:00.000000001 ON
2 1970-01-01 00:00:00.000000002 OFF
3 1970-01-01 00:00:00.000000003 OFF
4 1970-01-01 00:00:00.000000004 ON
5 1970-01-01 00:00:00.000000005 ON
6 1970-01-01 00:00:00.000000006 ON
7 1970-01-01 00:00:00.000000007 OFF
8 1970-01-01 00:00:00.000000008 ON
9 1970-01-01 00:00:00.000000009 ON
10 1970-01-01 00:00:00.000000010 OFF
11 1970-01-01 00:00:00.000000011 OFF
12 1970-01-01 00:00:00.000000012 ON
>>> cross_series_search = F.time_series_search(
... predicate='toggle == "ON"',
... interval_values="discrete",
... labels=["toggle", "discrete"],
... )([toggle_series, discrete_series])
# 4 Intervals in discrete_series created from intervals in toggle_series where predicate is true:
# Interval 1: [(1, 2.0)]
# Interval 2: [(4, 5.0), (5, 6.0), (6, 4.0)]
# Interval 3: [(8, 6.0), (9, 7.0)]
# Interval 4: [(12, 11.0)]
>>> cross_series_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 1 0 1 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000002 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000001 2.0 2.0 1970-01-01 00:00:00.000000001 2.0 0.000000 1970-01-01 00:00:00.000000001
1 3 0 3 1970-01-01 00:00:00.000000004 5.0 1970-01-01 00:00:00.000000007 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000006 4.0 5.0 1970-01-01 00:00:00.000000006 4.0 0.816497 1970-01-01 00:00:00.000000004
2 2 0 2 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000009 7.0 1970-01-01 00:00:00.000000009 7.0 6.5 1970-01-01 00:00:00.000000008 6.0 0.500000 1970-01-01 00:00:00.000000008
3 1 0 1 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000013 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000012 11.0 11.0 1970-01-01 00:00:00.000000012 11.0 0.000000 1970-01-01 00:00:00.000000012
中文翻译¶
foundryts.functions.time_series_search¶
foundryts.functions.time_series_search(predicate, labels=None, start=None, end=None, interval_values=None, before='nearest', internal='default', after='nearest', min_duration=None, max_duration=None)¶
返回一个函数,该函数将使用提供的谓词(predicate)在时间序列上搜索区间。
该函数将返回谓词为真的时间区间。
每个返回的区间都会被评估为关联的 statistics()。interval_values 中的 dsl() 公式将用于评估最终的 statistics()。
指定的插值策略(interpolation strategies)用于填充缺失的时间戳。有关插值和策略的更多详细信息,请参见 interpolate()。
此函数生成的区间等同于 Quiver 中的事件(events)。当时间序列表现出区间行为,并且对时间序列的分析需要访问这些区间时,此功能特别有用。然后,每个时间序列可以使用 time_range() 拆分为多个范围,使得每个区间成为一个新的 time_range(),并且可以在每个时间范围上独立应用操作。
- 参数:
- predicate(str)– 使用
dsl()条件程序搜索区间的谓词。 - labels(Union [str , List [str ] ] , 可选)– 每个输入时间序列的别名,用于在
predicate和interval_values中引用它们(默认为 ['a', 'b', …, 'aa', 'ab', …])。 - start(int | datetime | str , 可选)– 开始评估时间序列中区间的时间戳(包含)。对于与
start时间戳重叠的区间,完整的区间将包含在输出中(默认为 pandas.Timestamp.min)。 - end(int | datetime | str , 可选)– 结束评估时间序列中区间的时间戳(不包含)。对于与
end时间戳重叠的区间,完整的区间将包含在输出中(默认为 pandas.Timestamp.max)。 - interval_values(str , 可选)– 用于转换计算区间统计信息所依据的值的
dsl()程序。对于非数值型输入时间序列,这是必需的,因为无法对非数值型数据计算统计信息(默认为第一个输入时间序列)。 - before(Union [str , List [str ] ] , 可选)– 在序列第一个点之前进行插值的策略,可以为每个序列指定一个列表,使用
interpolate()中的有效策略(默认为NEAREST)。 - internal(Union [str , List [str ] ] , 可选)– 在现有点之间进行插值的策略,可以为每个序列指定一个列表,使用
interpolate()中的有效值(对于数值型时间序列默认为LINEAR,对于枚举型时间序列默认为PREVIOUS)。 - after(Union [str , List [str ] ] , 可选)– 在序列最后一个点之后进行插值的策略,可以为每个序列指定一个列表,使用
interpolate()中的有效策略(默认为NEAREST)。 - min_duration(int | str | datetime.timedelta , 可选)– 谓词必须为真的最小持续时间,时间范围才有资格成为区间。
- max_duration(int | str | datetime.timedelta , 可选)– 谓词必须为真的最大持续时间,时间范围才有资格成为区间。
- 返回: 一个函数,返回输入时间序列中满足谓词的区间的统计信息。
- 返回类型: (Union[FunctionNode, NodeCollections]) -> SummaryNode
数据框模式(Dataframe schema)¶
| 列名 | 类型 | 描述 |
|---|---|---|
| count | int | 区间中的数据点数量。 |
| earliest_point.timestamp | datetime | 区间中第一个数据点的时间戳。 |
| earliest_point.value | float | 区间中第一个数据点的值。 |
| end_timestamp | datetime | 区间结束的时间戳(不包含)。 |
| largest_point.timestamp | datetime | 区间中最大值数据点的时间戳。 |
| largest_point.value | float | 区间中的最大值。 |
| latest_point.timestamp | datetime | 区间中最近数据点的时间戳。 |
| latest_point.value | float | 区间中最近数据点的值。 |
| mean | float | 区间中所有数据点的平均值。 |
| smallest_point.timestamp | datetime | 区间中最小值数据点的时间戳。 |
| smallest_point.value | float | 区间中的最小值。 |
| start_timestamp | datetime | 区间中第一个数据点的时间戳。 |
| standard_deviation | float | 区间中数据点的标准差。 |
| duration.seconds | int | 区间的持续时间(秒)。 |
| duration.subsecond_nanos | int | 区间的持续时间(纳秒)。 |
:::callout{theme="success" title="另请参阅"}
interpolate(), statistics()
:::
示例¶
>>> discrete_series = F.points(
... (0, 1.0),
... (1, 2.0),
... (2, 2.0),
... (3, 3.0),
... (4, 5.0),
... (5, 6.0),
... (6, 4.0),
... (7, 2.0),
... (8, 6.0),
... (9, 7.0),
... (10, 8.0),
... (11, 10.0),
... (12, 11.0),
... name="discrete",
... )
>>> discrete_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000000 1.0
1 1970-01-01 00:00:00.000000001 2.0
2 1970-01-01 00:00:00.000000002 2.0
3 1970-01-01 00:00:00.000000003 3.0
4 1970-01-01 00:00:00.000000004 5.0
5 1970-01-01 00:00:00.000000005 6.0
6 1970-01-01 00:00:00.000000006 4.0
7 1970-01-01 00:00:00.000000007 2.0
8 1970-01-01 00:00:00.000000008 6.0
9 1970-01-01 00:00:00.000000009 7.0
10 1970-01-01 00:00:00.000000010 8.0
11 1970-01-01 00:00:00.000000011 10.0
12 1970-01-01 00:00:00.000000012 11.0
>>> even_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... )(discrete_series)
# 3 个区间及其数据点:
# 区间 1:[(1, 2.0), (2, 2.0)]
# 区间 2:[(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
# 区间 3:[(10, 8.0), (11, 10.0)]
>>> even_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.000000 1970-01-01 00:00:00.000000001
1 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
2 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.000000 1970-01-01 00:00:00.000000010
>>> search_formula = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete * 2",
... labels="discrete",
... )(discrete_series)
# 3 个区间及其数据点(值加倍):
# 区间 1:[(1, 4.0), (2, 4.0)]
# 区间 2:[(5, 12.0), (6, 8.0), (7, 4.0), (8, 12.0)]
# 区间 3:[(10, 16.0), (11, 20.0)]
>>> search_formula.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 4.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 4.0 1970-01-01 00:00:00.000000002 4.0 4.0 1970-01-01 00:00:00.000000002 4.0 0.000000 1970-01-01 00:00:00.000000001
1 4 0 4 1970-01-01 00:00:00.000000005 12.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 12.0 1970-01-01 00:00:00.000000008 12.0 9.0 1970-01-01 00:00:00.000000007 4.0 3.316625 1970-01-01 00:00:00.000000005
2 2 0 2 1970-01-01 00:00:00.000000010 16.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 20.0 1970-01-01 00:00:00.000000011 20.0 18.0 1970-01-01 00:00:00.000000010 16.0 2.000000 1970-01-01 00:00:00.000000010
>>> min_duration_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... min_duration="3ns",
... )(discrete_series)
# 第一个和最后一个区间因持续时间 < 3 而被过滤
# 1 个区间及其数据点:
# [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
>>> min_duration_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
>>> max_duration_search = F.time_series_search(
... predicate="discrete % 2 == 0",
... interval_values="discrete",
... labels="discrete",
... max_duration="3ns",
... )(discrete_series)
# 第二个区间因持续时间 > 3 而被过滤
# 2 个区间及其数据点:
# 区间 1:[(1, 2.0), (2, 2.0)]
# 区间 2:[(10, 8.0), (11, 10.0)]
>>> max_duration_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.0 1970-01-01 00:00:00.000000001
1 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.0 1970-01-01 00:00:00.000000010
>>> toggle_series = F.points(
... (0, "OFF"),
... (1, "ON"),
... (2, "OFF"),
... (3, "OFF"),
... (4, "ON"),
... (5, "ON"),
... (6, "ON"),
... (7, "OFF"),
... (8, "ON"),
... (9, "ON"),
... (10, "OFF"),
... (11, "OFF"),
... (12, "ON"),
... name="toggle",
... )
>>> toggle_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000000 OFF
1 1970-01-01 00:00:00.000000001 ON
2 1970-01-01 00:00:00.000000002 OFF
3 1970-01-01 00:00:00.000000003 OFF
4 1970-01-01 00:00:00.000000004 ON
5 1970-01-01 00:00:00.000000005 ON
6 1970-01-01 00:00:00.000000006 ON
7 1970-01-01 00:00:00.000000007 OFF
8 1970-01-01 00:00:00.000000008 ON
9 1970-01-01 00:00:00.000000009 ON
10 1970-01-01 00:00:00.000000010 OFF
11 1970-01-01 00:00:00.000000011 OFF
12 1970-01-01 00:00:00.000000012 ON
>>> cross_series_search = F.time_series_search(
... predicate='toggle == "ON"',
... interval_values="discrete",
... labels=["toggle", "discrete"],
... )([toggle_series, discrete_series])
# 从 toggle_series 中谓词为真的区间创建的 discrete_series 中的 4 个区间:
# 区间 1:[(1, 2.0)]
# 区间 2:[(4, 5.0), (5, 6.0), (6, 4.0)]
# 区间 3:[(8, 6.0), (9, 7.0)]
# 区间 4:[(12, 11.0)]
>>> cross_series_search.to_pandas()
count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time
0 1 0 1 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000002 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000001 2.0 2.0 1970-01-01 00:00:00.000000001 2.0 0.000000 1970-01-01 00:00:00.000000001
1 3 0 3 1970-01-01 00:00:00.000000004 5.0 1970-01-01 00:00:00.000000007 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000006 4.0 5.0 1970-01-01 00:00:00.000000006 4.0 0.816497 1970-01-01 00:00:00.000000004
2 2 0 2 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000009 7.0 1970-01-01 00:00:00.000000009 7.0 6.5 1970-01-01 00:00:00.000000008 6.0 0.500000 1970-01-01 00:00:00.000000008
3 1 0 1 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000013 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000012 11.0 11.0 1970-01-01 00:00:00.000000012 11.0 0.000000 1970-01-01 00:00:00.000000012