foundryts.functions.dsl¶
foundryts.functions.dsl(program, return_type=None, labels=None, before='nearest', internal='default', after='nearest')¶
Returns a function that applies a DSL formula to one or more input timeseries to yield a new timeseries.
The formula is applied to each timestamp in the input timeseries. The specified interpolation strategy is used for
timestamps where one of the input timeseries is missing a value. See interpolate() for a list of available
strategies.
A DSL formula can apply on unary, binary or n-ary operands. All operations in the DSL are composable, allowing you to build complex expressions by combining simpler ones.
Find the ↗ full reference of the Timeseries DSL Syntax here.
- Parameters:
- program (str) – Formula to apply to one or more input timeseries using syntax from the ↗ Timeseries DSL Reference docs.
- return_type (Type , optional) – (DEPRECATED) Type to use for values of the transformed series.
- labels (List [str ] , optional) – Ordered list of labels to refer to each input timeseries in the formula (default is [‘a’, ‘b’, …, ‘aa’, ‘ab’, …])
- before (Union [str , List [str ] ] , optional) – Strategy for interpolating points before the first in the series, which can be a list per series,
use a valid strategy from
interpolate()(default isNEAREST). - internal (Union [str , List [str ] ] , optional) – Strategy for interpolating points between existing points, which can be a list per series,
use a valid strategy from
interpolate()(default isLINEARfor numeric andPREVIOUSfor enum timeseries). - after (Union [str , List [str ] ] , optional) – Strategy for interpolating points after the first in the series, which can be a list per series,
use a valid strategy from
interpolate()(default isNEAREST). - Returns: A function that takes one or more timeseries and applies the formula on the input timeseries to return a single updated timeseries.
- Return type: (FunctionNode) -> FunctionNode
Dataframe schema¶
| Column name | Type | Description |
|---|---|---|
| timestamp | pandas.Timestamp | Timestamp of the point |
| value | Union[float, str] | Value of the point |
:::callout{theme="success" title="See Also"}
interpolate(), udf()
:::
Examples¶
>>> series_1 = F.points(
... (10, 6.0),
... (20, 11.0),
... (30, 24.0),
... (40, float("inf")),
... (50, 45.0),
... (60, 96.0),
... name="series-1",
... )
>>> series_2 = F.points(
... (10, 8.0),
... (20, 12.0),
... (30, 24.0),
... (40, 48.0),
... (50, float("NaN")),
... (60, 196.0),
... name="series-2",
... )
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000020 11.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 45.0
5 1970-01-01 00:00:00.000000060 96.0
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 8.0
1 1970-01-01 00:00:00.000000020 12.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 196.0
>>> sum_formula = "a+b"
>>> sum_series = F.dsl(sum_formula)([series_1, series_2])
>>> sum_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 14.0
1 1970-01-01 00:00:00.000000020 23.0
2 1970-01-01 00:00:00.000000030 48.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 292.0
>>> even_only_formula = "a % 2 == 0 ? a : skip"
>>> even_series_1 = F.dsl(even_only_formula)(series_1)
>>> even_series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000030 24.0
2 1970-01-01 00:00:00.000000060 96.0
>>> argmin_formula = "argmin(a,b)"
>>> argmin_series = F.dsl(argmin_formula)([series_1, series_2])
>>> argmin_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 0.0
1 1970-01-01 00:00:00.000000020 0.0
2 1970-01-01 00:00:00.000000030 0.0
3 1970-01-01 00:00:00.000000040 1.0 # inf is greater than 48
4 1970-01-01 00:00:00.000000050 0.0
5 1970-01-01 00:00:00.000000060 0.0
>>> pow_formula = "pow(a, 2)"
>>> pow2_series = F.dsl(pow_formula)([series_1])
>>> pow2_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 36.0
1 1970-01-01 00:00:00.000000020 121.0
2 1970-01-01 00:00:00.000000030 576.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 2025.0
5 1970-01-01 00:00:00.000000060 9216.0
>>> non_nan_formula = "isnan(b) ? a : b"
>>> non_nan_series = F.dsl(non_nan_formula)([series_1, series_2])
>>> non_nan_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 8.0
1 1970-01-01 00:00:00.000000020 12.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0
4 1970-01-01 00:00:00.000000050 45.0 # only value from series-1
5 1970-01-01 00:00:00.000000060 196.0
>>> non_inf_formula = "isfinite(a) ? a : b"
>>> non_inf_series = F.dsl(non_inf_formula)([series_1, series_2])
>>> non_inf_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000020 11.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0 # only value from series-2
4 1970-01-01 00:00:00.000000050 45.0
5 1970-01-01 00:00:00.000000060 96.0
>>> assignment_formula = '''
... var doubled_shifted = a * 2; // all values are doubled
... doubled_shifted += 10; // add 10 to all doubled values
... doubled_shifted - b // difference between double_shifted and b
... '''
>>> var_assigned_series = F.dsl(assignment_formula)([series_1, series_2])
>>> var_assigned_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 14.0
1 1970-01-01 00:00:00.000000020 20.0
2 1970-01-01 00:00:00.000000030 34.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 6.0
中文翻译¶
foundryts.functions.dsl¶
foundryts.functions.dsl(program, return_type=None, labels=None, before='nearest', internal='default', after='nearest')¶
返回一个函数,该函数将一个 DSL 公式应用于一个或多个输入时间序列,以生成一个新的时间序列。
该公式应用于输入时间序列中的每个时间戳。当某个输入时间序列在某个时间戳缺少值时,将使用指定的插值策略。有关可用策略的列表,请参见 interpolate()。
DSL 公式可应用于一元、二元或 n 元操作数。DSL 中的所有操作都是可组合的,允许您通过组合简单表达式来构建复杂表达式。
在此处查找 ↗ Timeseries DSL 语法完整参考。
- 参数:
- program (str) – 使用 ↗ Timeseries DSL 参考文档 中的语法,应用于一个或多个输入时间序列的公式。
- return_type (Type , optional) – (已弃用)转换后序列值的类型。
- labels (List [str ] , optional) – 有序标签列表,用于在公式中引用每个输入时间序列(默认为 ['a', 'b', …, 'aa', 'ab', …])。
- before (Union [str , List [str ] ] , optional) – 序列第一个点之前点的插值策略,可以为每个序列指定一个列表,使用
interpolate()中的有效策略(默认为NEAREST)。 - internal (Union [str , List [str ] ] , optional) – 现有点之间点的插值策略,可以为每个序列指定一个列表,使用
interpolate()中的有效策略(对于数值型时间序列默认为LINEAR,对于枚举型时间序列默认为PREVIOUS)。 - after (Union [str , List [str ] ] , optional) – 序列第一个点之后点的插值策略,可以为每个序列指定一个列表,使用
interpolate()中的有效策略(默认为NEAREST)。 - 返回: 一个函数,该函数接受一个或多个时间序列,并将公式应用于输入时间序列,返回单个更新后的时间序列。
- 返回类型: (FunctionNode) -> FunctionNode
Dataframe 模式¶
| 列名 | 类型 | 描述 |
|---|---|---|
| timestamp | pandas.Timestamp | 点的时间戳 |
| value | Union[float, str] | 点的值 |
:::callout{theme="success" title="另请参阅"}
interpolate(), udf()
:::
示例¶
>>> series_1 = F.points(
... (10, 6.0),
... (20, 11.0),
... (30, 24.0),
... (40, float("inf")),
... (50, 45.0),
... (60, 96.0),
... name="series-1",
... )
>>> series_2 = F.points(
... (10, 8.0),
... (20, 12.0),
... (30, 24.0),
... (40, 48.0),
... (50, float("NaN")),
... (60, 196.0),
... name="series-2",
... )
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000020 11.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 45.0
5 1970-01-01 00:00:00.000000060 96.0
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 8.0
1 1970-01-01 00:00:00.000000020 12.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 196.0
>>> sum_formula = "a+b"
>>> sum_series = F.dsl(sum_formula)([series_1, series_2])
>>> sum_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 14.0
1 1970-01-01 00:00:00.000000020 23.0
2 1970-01-01 00:00:00.000000030 48.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 292.0
>>> even_only_formula = "a % 2 == 0 ? a : skip"
>>> even_series_1 = F.dsl(even_only_formula)(series_1)
>>> even_series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000030 24.0
2 1970-01-01 00:00:00.000000060 96.0
>>> argmin_formula = "argmin(a,b)"
>>> argmin_series = F.dsl(argmin_formula)([series_1, series_2])
>>> argmin_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 0.0
1 1970-01-01 00:00:00.000000020 0.0
2 1970-01-01 00:00:00.000000030 0.0
3 1970-01-01 00:00:00.000000040 1.0 # inf 大于 48
4 1970-01-01 00:00:00.000000050 0.0
5 1970-01-01 00:00:00.000000060 0.0
>>> pow_formula = "pow(a, 2)"
>>> pow2_series = F.dsl(pow_formula)([series_1])
>>> pow2_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 36.0
1 1970-01-01 00:00:00.000000020 121.0
2 1970-01-01 00:00:00.000000030 576.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 2025.0
5 1970-01-01 00:00:00.000000060 9216.0
>>> non_nan_formula = "isnan(b) ? a : b"
>>> non_nan_series = F.dsl(non_nan_formula)([series_1, series_2])
>>> non_nan_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 8.0
1 1970-01-01 00:00:00.000000020 12.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0
4 1970-01-01 00:00:00.000000050 45.0 # 仅来自 series-1 的值
5 1970-01-01 00:00:00.000000060 196.0
>>> non_inf_formula = "isfinite(a) ? a : b"
>>> non_inf_series = F.dsl(non_inf_formula)([series_1, series_2])
>>> non_inf_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 6.0
1 1970-01-01 00:00:00.000000020 11.0
2 1970-01-01 00:00:00.000000030 24.0
3 1970-01-01 00:00:00.000000040 48.0 # 仅来自 series-2 的值
4 1970-01-01 00:00:00.000000050 45.0
5 1970-01-01 00:00:00.000000060 96.0
>>> assignment_formula = '''
... var doubled_shifted = a * 2; // 所有值加倍
... doubled_shifted += 10; // 所有加倍后的值再加 10
... doubled_shifted - b // double_shifted 与 b 的差值
... '''
>>> var_assigned_series = F.dsl(assignment_formula)([series_1, series_2])
>>> var_assigned_series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000010 14.0
1 1970-01-01 00:00:00.000000020 20.0
2 1970-01-01 00:00:00.000000030 34.0
3 1970-01-01 00:00:00.000000040 inf
4 1970-01-01 00:00:00.000000050 NaN
5 1970-01-01 00:00:00.000000060 6.0