foundryts.functions.scatter¶
foundryts.functions.scatter(start=None, end=None, before='NONE', internal='LINEAR', after='NONE', regression=None, regression_fit=None)¶
Returns a function that will generate the list of aligned (x,y) points for exactly two time series.
A scatter plot consists of (x,y) coordinates. For two given time series, an (x,y) coordinate will consist of a point from each series where the timestamps match. For points where the underlying series timestamps do not match, the configured interpolation strategy will be used for the series missing a point at that timestamp.
Read about supported interpolation strategies for internal, before and after in interpolate()
Additionally, you can pass a regression function to find the best fit line across the points in the graph.
- Parameters:
- start (int | datetime | str , optional) – Timestamp (inclusive) to start aligning points (default is pandas.Timestamp.min)
- end (int | datetime | str , optional) – Timestamp (exclusive) to finish aligning points (default is pandas.Timestamp.max)
- before (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the first point, use interpolation available strategies
from
interpolate()(default isNONE) - internal (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning all points within the series, use interpolation
available strategies from
interpolate()(default isLINEAR) - after (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the last point, use interpolation available strategies
from
interpolate()(default isNONE) - regression (
linear_regression()|polynomial_regression()|exponential_regression(), optional) – Output of one of the regression functions, this will provide points for the best fit line (as well as other related metrics) line between the two input series (defaults to no regression). - Returns: Returns a function that accepts exactly two series as input, and returns the aligned points for the scatterplot. Each row in the resulting dataframe represents an aligned point.
- Return type: (NodeCollection) -> SummarizerNode
Dataframe schema¶
| Column name | Type | Description |
|---|---|---|
| is_truncated | bool | This field is deprecated and should be ignored. If the output was truncated for a large series. |
| points.first_value | float | Value of the point in the first series. |
| points.second_value | float | Value of point in the second series. |
| points.timestamp | datetime | Timestamp of the points. |
| regression.* | float | Columns from the regression function (if regression is used). |
:::callout{theme="success" title="See Also"}
interpolate(), linear_regression(), polynomial_regression(), exponential_regression()
:::
:::callout{theme="warning" title="Note"} This function is only applicable to numeric series. :::
Examples¶
>>> series_1 = F.points((11, 21.0), (13, 23.0), (15, 25.0), (17, 27.0), name="series-1")
>>> series_2 = F.points((11, 21.0), (13, 23.0), (17, 37.0), (37, 47.0), name="series-2")
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000011 21.0
1 1970-01-01 00:00:00.000000013 23.0
2 1970-01-01 00:00:00.000000015 25.0
3 1970-01-01 00:00:00.000000017 27.0
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000011 21.0
1 1970-01-01 00:00:00.000000013 23.0
2 1970-01-01 00:00:00.000000017 37.0
3 1970-01-01 00:00:00.000000037 47.0
>>> nc = NodeCollection([series_1, series_2])
>>> scatter_plot = F.scatter( # scatter plot with interpolation
... before="NEAREST",
... internal="LINEAR",
... after="NEAREST",
... )(nc)
>>> scatter_plot.to_pandas()
is_truncated points.first_value points.second_value points.timestamp
0 False 21.0 21.0 1970-01-01 00:00:00.000000011
1 False 23.0 23.0 1970-01-01 00:00:00.000000013
2 False 25.0 30.0 1970-01-01 00:00:00.000000015
3 False 27.0 37.0 1970-01-01 00:00:00.000000017
4 False 27.0 47.0 1970-01-01 00:00:00.000000037
>>> lin_regression_scatter_plot = F.scatter(
... before="NEAREST",
... internal="LINEAR",
... after="NEAREST",
... regression=F.linear_regression(),
... )(nc)
>>> lin_regression_scatter_plot.to_pandas()
is_truncated points.first_value points.second_value points.timestamp regression.max_bounds.first_value regression.max_bounds.second_value regression.min_bounds.first_value regression.min_bounds.second_value regression.regression_fit_function.linear_regression_fit.intercept regression.regression_fit_function.linear_regression_fit.slope regression.regression_fit_function.linear_regression_fit.statistics.rsquared
0 False 21.0 21.0 1970-01-01 00:00:00.000000011 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
1 False 23.0 23.0 1970-01-01 00:00:00.000000013 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
2 False 25.0 30.0 1970-01-01 00:00:00.000000015 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
3 False 27.0 37.0 1970-01-01 00:00:00.000000017 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
4 False 27.0 47.0 1970-01-01 00:00:00.000000037 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
中文翻译¶
foundryts.functions.scatter¶
foundryts.functions.scatter(start=None, end=None, before='NONE', internal='LINEAR', after='NONE', regression=None, regression_fit=None)¶
返回一个函数,用于生成恰好两个时间序列的对齐 (x,y) 点列表。
散点图由 (x,y) 坐标组成。对于两个给定的时间序列,一个 (x,y) 坐标将由每个序列中时间戳匹配的点组成。对于底层序列时间戳不匹配的点,将使用配置的插值策略(interpolation strategy)来处理在该时间戳上缺失点的序列。
关于 internal、before 和 after 支持的插值策略,请参阅 interpolate()
此外,您可以传入一个回归函数(regression function)来查找图中各点的最佳拟合线。
- 参数:
- start (int | datetime | str , 可选) – 开始对齐点的时间戳(包含),默认为 pandas.Timestamp.min
- end (int | datetime | str , 可选) – 完成对齐点的时间戳(不包含),默认为 pandas.Timestamp.max
- before (str | List [str ] , 可选) – 用于对齐第一个点的插值策略名称,使用
interpolate()中可用的插值策略(默认为NONE) - internal (str | List [str ] , 可选) – 用于对齐序列内所有点的插值策略名称,使用
interpolate()中可用的插值策略(默认为LINEAR) - after (str | List [str ] , 可选) – 用于对齐最后一个点的插值策略名称,使用
interpolate()中可用的插值策略(默认为NONE) - regression (
linear_regression()|polynomial_regression()|exponential_regression(), 可选) – 某个回归函数的输出,将为两个输入序列之间的最佳拟合线提供点(以及其他相关指标),默认为无回归。 - 返回: 返回一个函数,该函数接受恰好两个序列作为输入,并返回散点图的对齐点。结果数据框中的每一行代表一个对齐点。
- 返回类型: (NodeCollection) -> SummarizerNode
数据框模式(Dataframe schema)¶
| 列名 | 类型 | 描述 |
|---|---|---|
| is_truncated | bool | 此字段已弃用,应忽略。 表示输出是否因序列过大而被截断。 |
| points.first_value | float | 第一个序列中点的值。 |
| points.second_value | float | 第二个序列中点的值。 |
| points.timestamp | datetime | 点的时间戳。 |
| regression.* | float | 回归函数返回的列(如果使用了回归)。 |
:::callout{theme="success" title="另请参阅"}
interpolate(), linear_regression(), polynomial_regression(), exponential_regression()
:::
:::callout{theme="warning" title="注意"} 此函数仅适用于数值型序列。 :::
示例¶
>>> series_1 = F.points((11, 21.0), (13, 23.0), (15, 25.0), (17, 27.0), name="series-1")
>>> series_2 = F.points((11, 21.0), (13, 23.0), (17, 37.0), (37, 47.0), name="series-2")
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000011 21.0
1 1970-01-01 00:00:00.000000013 23.0
2 1970-01-01 00:00:00.000000015 25.0
3 1970-01-01 00:00:00.000000017 27.0
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000011 21.0
1 1970-01-01 00:00:00.000000013 23.0
2 1970-01-01 00:00:00.000000017 37.0
3 1970-01-01 00:00:00.000000037 47.0
>>> nc = NodeCollection([series_1, series_2])
>>> scatter_plot = F.scatter( # 带插值的散点图
... before="NEAREST",
... internal="LINEAR",
... after="NEAREST",
... )(nc)
>>> scatter_plot.to_pandas()
is_truncated points.first_value points.second_value points.timestamp
0 False 21.0 21.0 1970-01-01 00:00:00.000000011
1 False 23.0 23.0 1970-01-01 00:00:00.000000013
2 False 25.0 30.0 1970-01-01 00:00:00.000000015
3 False 27.0 37.0 1970-01-01 00:00:00.000000017
4 False 27.0 47.0 1970-01-01 00:00:00.000000037
>>> lin_regression_scatter_plot = F.scatter(
... before="NEAREST",
... internal="LINEAR",
... after="NEAREST",
... regression=F.linear_regression(),
... )(nc)
>>> lin_regression_scatter_plot.to_pandas()
is_truncated points.first_value points.second_value points.timestamp regression.max_bounds.first_value regression.max_bounds.second_value regression.min_bounds.first_value regression.min_bounds.second_value regression.regression_fit_function.linear_regression_fit.intercept regression.regression_fit_function.linear_regression_fit.slope regression.regression_fit_function.linear_regression_fit.statistics.rsquared
0 False 21.0 21.0 1970-01-01 00:00:00.000000011 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
1 False 23.0 23.0 1970-01-01 00:00:00.000000013 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
2 False 25.0 30.0 1970-01-01 00:00:00.000000015 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
3 False 27.0 37.0 1970-01-01 00:00:00.000000017 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161
4 False 27.0 47.0 1970-01-01 00:00:00.000000037 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161