跳转至

foundryts.functions.scatter

foundryts.functions.scatter(start=None, end=None, before='NONE', internal='LINEAR', after='NONE', regression=None, regression_fit=None)

Returns a function that will generate the list of aligned (x,y) points for exactly two time series.

A scatter plot consists of (x,y) coordinates. For two given time series, an (x,y) coordinate will consist of a point from each series where the timestamps match. For points where the underlying series timestamps do not match, the configured interpolation strategy will be used for the series missing a point at that timestamp.

Read about supported interpolation strategies for internal, before and after in interpolate()

Additionally, you can pass a regression function to find the best fit line across the points in the graph.

  • Parameters:
  • start (int | datetime | str , optional) – Timestamp (inclusive) to start aligning points (default is pandas.Timestamp.min)
  • end (int | datetime | str , optional) – Timestamp (exclusive) to finish aligning points (default is pandas.Timestamp.max)
  • before (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the first point, use interpolation available strategies from interpolate() (default is NONE)
  • internal (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning all points within the series, use interpolation available strategies from interpolate() (default is LINEAR)
  • after (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the last point, use interpolation available strategies from interpolate() (default is NONE)
  • regression (linear_regression() | polynomial_regression() | exponential_regression(), optional) – Output of one of the regression functions, this will provide points for the best fit line (as well as other related metrics) line between the two input series (defaults to no regression).
  • Returns: Returns a function that accepts exactly two series as input, and returns the aligned points for the scatterplot. Each row in the resulting dataframe represents an aligned point.
  • Return type: (NodeCollection) -> SummarizerNode

Dataframe schema

Column name Type Description
is_truncated bool This field is deprecated and should be ignored.
If the output was truncated for a large series.
points.first_value float Value of the point in the first series.
points.second_value float Value of point in the second series.
points.timestamp datetime Timestamp of the points.
regression.* float Columns from the regression function (if
regression is used).

:::callout{theme="success" title="See Also"} interpolate(), linear_regression(), polynomial_regression(), exponential_regression() :::

:::callout{theme="warning" title="Note"} This function is only applicable to numeric series. :::

Examples

>>> series_1 = F.points((11, 21.0), (13, 23.0), (15, 25.0), (17, 27.0), name="series-1")
>>> series_2 = F.points((11, 21.0), (13, 23.0), (17, 37.0), (37, 47.0), name="series-2")
>>> series_1.to_pandas()
                      timestamp  value
0 1970-01-01 00:00:00.000000011   21.0
1 1970-01-01 00:00:00.000000013   23.0
2 1970-01-01 00:00:00.000000015   25.0
3 1970-01-01 00:00:00.000000017   27.0
>>> series_2.to_pandas()
                      timestamp  value
0 1970-01-01 00:00:00.000000011   21.0
1 1970-01-01 00:00:00.000000013   23.0
2 1970-01-01 00:00:00.000000017   37.0
3 1970-01-01 00:00:00.000000037   47.0
>>> nc = NodeCollection([series_1, series_2])
>>> scatter_plot = F.scatter( # scatter plot with interpolation
...     before="NEAREST",
...     internal="LINEAR",
...     after="NEAREST",
... )(nc)
>>> scatter_plot.to_pandas()
    is_truncated  points.first_value  points.second_value              points.timestamp
    0         False                21.0                 21.0 1970-01-01 00:00:00.000000011
    1         False                23.0                 23.0 1970-01-01 00:00:00.000000013
    2         False                25.0                 30.0 1970-01-01 00:00:00.000000015
    3         False                27.0                 37.0 1970-01-01 00:00:00.000000017
    4         False                27.0                 47.0 1970-01-01 00:00:00.000000037
>>> lin_regression_scatter_plot = F.scatter(
...     before="NEAREST",
...     internal="LINEAR",
...     after="NEAREST",
...     regression=F.linear_regression(),
... )(nc)
>>> lin_regression_scatter_plot.to_pandas()
   is_truncated  points.first_value  points.second_value              points.timestamp  regression.max_bounds.first_value  regression.max_bounds.second_value  regression.min_bounds.first_value  regression.min_bounds.second_value  regression.regression_fit_function.linear_regression_fit.intercept  regression.regression_fit_function.linear_regression_fit.slope  regression.regression_fit_function.linear_regression_fit.statistics.rsquared
0         False                21.0                 21.0 1970-01-01 00:00:00.000000011                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
1         False                23.0                 23.0 1970-01-01 00:00:00.000000013                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
2         False                25.0                 30.0 1970-01-01 00:00:00.000000015                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
3         False                27.0                 37.0 1970-01-01 00:00:00.000000017                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
4         False                27.0                 47.0 1970-01-01 00:00:00.000000037                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161

中文翻译

foundryts.functions.scatter

foundryts.functions.scatter(start=None, end=None, before='NONE', internal='LINEAR', after='NONE', regression=None, regression_fit=None)

返回一个函数,用于生成恰好两个时间序列的对齐 (x,y) 点列表。

散点图由 (x,y) 坐标组成。对于两个给定的时间序列,一个 (x,y) 坐标将由每个序列中时间戳匹配的点组成。对于底层序列时间戳不匹配的点,将使用配置的插值策略(interpolation strategy)来处理在该时间戳上缺失点的序列。

关于 internal、before 和 after 支持的插值策略,请参阅 interpolate()

此外,您可以传入一个回归函数(regression function)来查找图中各点的最佳拟合线。

  • 参数:
  • start (int | datetime | str , 可选) – 开始对齐点的时间戳(包含),默认为 pandas.Timestamp.min
  • end (int | datetime | str , 可选) – 完成对齐点的时间戳(不包含),默认为 pandas.Timestamp.max
  • before (str | List [str ] , 可选) – 用于对齐第一个点的插值策略名称,使用 interpolate() 中可用的插值策略(默认为 NONE
  • internal (str | List [str ] , 可选) – 用于对齐序列内所有点的插值策略名称,使用 interpolate() 中可用的插值策略(默认为 LINEAR
  • after (str | List [str ] , 可选) – 用于对齐最后一个点的插值策略名称,使用 interpolate() 中可用的插值策略(默认为 NONE
  • regression (linear_regression() | polynomial_regression() | exponential_regression(), 可选) – 某个回归函数的输出,将为两个输入序列之间的最佳拟合线提供点(以及其他相关指标),默认为无回归。
  • 返回: 返回一个函数,该函数接受恰好两个序列作为输入,并返回散点图的对齐点。结果数据框中的每一行代表一个对齐点。
  • 返回类型: (NodeCollection) -> SummarizerNode

数据框模式(Dataframe schema)

列名 类型 描述
is_truncated bool 此字段已弃用,应忽略。
表示输出是否因序列过大而被截断。
points.first_value float 第一个序列中点的值。
points.second_value float 第二个序列中点的值。
points.timestamp datetime 点的时间戳。
regression.* float 回归函数返回的列(如果使用了回归)。

:::callout{theme="success" title="另请参阅"} interpolate(), linear_regression(), polynomial_regression(), exponential_regression() :::

:::callout{theme="warning" title="注意"} 此函数仅适用于数值型序列。 :::

示例

>>> series_1 = F.points((11, 21.0), (13, 23.0), (15, 25.0), (17, 27.0), name="series-1")
>>> series_2 = F.points((11, 21.0), (13, 23.0), (17, 37.0), (37, 47.0), name="series-2")
>>> series_1.to_pandas()
                      timestamp  value
0 1970-01-01 00:00:00.000000011   21.0
1 1970-01-01 00:00:00.000000013   23.0
2 1970-01-01 00:00:00.000000015   25.0
3 1970-01-01 00:00:00.000000017   27.0
>>> series_2.to_pandas()
                      timestamp  value
0 1970-01-01 00:00:00.000000011   21.0
1 1970-01-01 00:00:00.000000013   23.0
2 1970-01-01 00:00:00.000000017   37.0
3 1970-01-01 00:00:00.000000037   47.0
>>> nc = NodeCollection([series_1, series_2])
>>> scatter_plot = F.scatter( # 带插值的散点图
...     before="NEAREST",
...     internal="LINEAR",
...     after="NEAREST",
... )(nc)
>>> scatter_plot.to_pandas()
    is_truncated  points.first_value  points.second_value              points.timestamp
    0         False                21.0                 21.0 1970-01-01 00:00:00.000000011
    1         False                23.0                 23.0 1970-01-01 00:00:00.000000013
    2         False                25.0                 30.0 1970-01-01 00:00:00.000000015
    3         False                27.0                 37.0 1970-01-01 00:00:00.000000017
    4         False                27.0                 47.0 1970-01-01 00:00:00.000000037
>>> lin_regression_scatter_plot = F.scatter(
...     before="NEAREST",
...     internal="LINEAR",
...     after="NEAREST",
...     regression=F.linear_regression(),
... )(nc)
>>> lin_regression_scatter_plot.to_pandas()
   is_truncated  points.first_value  points.second_value              points.timestamp  regression.max_bounds.first_value  regression.max_bounds.second_value  regression.min_bounds.first_value  regression.min_bounds.second_value  regression.regression_fit_function.linear_regression_fit.intercept  regression.regression_fit_function.linear_regression_fit.slope  regression.regression_fit_function.linear_regression_fit.statistics.rsquared
0         False                21.0                 21.0 1970-01-01 00:00:00.000000011                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
1         False                23.0                 23.0 1970-01-01 00:00:00.000000013                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
2         False                25.0                 30.0 1970-01-01 00:00:00.000000015                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
3         False                27.0                 37.0 1970-01-01 00:00:00.000000017                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161
4         False                27.0                 47.0 1970-01-01 00:00:00.000000037                               27.0                                47.0                               21.0                                21.0                                         -59.926471                                                            3.720588                                                        0.827161