foundryts.functions.distribution¶
foundryts.functions.distribution(start=None, end=None, start_value=None, end_value=None, bins=None)¶
Returns a function that will evaluate the distribution of one or more time-series.
A distribution is a breakdown of points into bins of values that partition the requested range of values. Evaluating the distribution returns a list of the bins which describe the number of points in their range, as well as the start and end of the range.
The distribution can be applied to a single series or multiple series, in which case the distribution function considers a union of values from all series for each bin in the final dataframe.
The delta for the value range for each bin is constant and is calculated using (max value - min value) / (number of bins)
- Parameters:
- start (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to start evaluating a distribution over the provided series (default is the earliest timestamp in any of the input time series)
- end (Union [int , datetime , str ] , optional) – Timestamp (exclusive) to end evaluating a distribution over the provided series (default is the latest timestamp in any of the input time series)
- start_value (float , optional) – Lower bound (inclusive) of the value range to evaluate the distribution over (default is the minimum value of any of the input time series)
- end_value (float , optional) – Upper bound (exclusive) of the value range to evaluate the distribution over (default is the maximum value of any of the input time series)
- bins (int , optional) – Number of value-bins to distribute points over (default is 10).
- Returns: A function that accepts one or more series as inputs and generates the distribution over all points in the specified or default number of bins.
- Return type: (Union[FunctionNode, NodeCollection]) -> SummarizerNode
Dataframe schema¶
| Column name | Type | Description |
|---|---|---|
| start_timestamp | datetime | Start time of the distribution (inclusive) |
| end_timestamp | datetime | End time of the distribution (exclusive) |
| start | float | Lower bound of values (inclusive) |
| end | float | Upper bound of values (exclusive) |
| delta | float | The difference between the min and max values of each bin. Given how bins are calculated, delta is fixed for all bins. |
| distribution_values.start | float | Start value of a distribution bin |
| distribution_values.end | float | End value of a distribution bin |
| distribution_values.count | int | Number of instances in a distribution bin |
:::callout{theme="success" title="See Also"}
statistics(), scatter()
:::
:::callout{theme="warning" title="Note"} This function is only applicable to numeric series. :::
Examples¶
>>> series_1 = F.points(
... (1, 0.0),
... (101, 10.2),
... (200, 11.3),
... (201, 11.1),
... (299, 11.2),
... (300, 12.0),
... (400, 11.7),
... (500, 16.0),
... (123450, 11.8),
... name="series-1",
... )
>>> series_2 = F.points(
... (1, 0.5),
... (101, 0.2),
... (200, 1.3),
... (201, 0.1),
... (299, 1.2),
... (300, 1.4),
... (400, 1.0),
... (500, 2.0),
... (123450, 1.0),
... name="series-2",
... )
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000001 0.0
1 1970-01-01 00:00:00.000000101 10.2
2 1970-01-01 00:00:00.000000200 11.3
3 1970-01-01 00:00:00.000000201 11.1
4 1970-01-01 00:00:00.000000299 11.2
5 1970-01-01 00:00:00.000000300 12.0
6 1970-01-01 00:00:00.000000400 11.7
7 1970-01-01 00:00:00.000000500 16.0
8 1970-01-01 00:00:00.000123450 11.8
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000001 0.5
1 1970-01-01 00:00:00.000000101 0.2
2 1970-01-01 00:00:00.000000200 1.3
3 1970-01-01 00:00:00.000000201 0.1
4 1970-01-01 00:00:00.000000299 1.2
5 1970-01-01 00:00:00.000000300 1.4
6 1970-01-01 00:00:00.000000400 1.0
7 1970-01-01 00:00:00.000000500 2.0
8 1970-01-01 00:00:00.000123450 1.0
>>> nc = NodeCollection(series_1, series_2)
>>> single_dist = F.distribution(bins=3)(series_1) # single series distribution
>>> single_dist.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 5.333333 1 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
>>> multiple_dist = F.distribution(bins=3)(nc) # multiple series distribution
>>> multiple_dist.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 5.333333 10 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
中文翻译¶
foundryts.functions.distribution¶
foundryts.functions.distribution(start=None, end=None, start_value=None, end_value=None, bins=None)¶
返回一个函数,用于评估一个或多个时间序列(time-series)的分布。
分布(distribution)是将数据点按值划分到多个区间(bins)中,这些区间对请求的值范围进行划分。评估分布会返回一个区间列表,描述每个区间范围内的数据点数量,以及该范围的起始值和结束值。
该分布可应用于单个序列或多个序列。当应用于多个序列时,分布函数会将所有序列的值合并,在最终的数据框(dataframe)中为每个区间考虑所有序列的值的并集。
每个区间的值范围增量(delta)是恒定的,计算公式为:(最大值 - 最小值) / (区间数量)
- 参数:
- start (Union [int , datetime , str ] , 可选) – 开始评估分布的时间戳(包含),默认为所有输入时间序列中最早的时间戳
- end (Union [int , datetime , str ] , 可选) – 结束评估分布的时间戳(不包含),默认为所有输入时间序列中最晚的时间戳
- start_value (float , 可选) – 评估分布的值范围的下限(包含),默认为所有输入时间序列中的最小值
- end_value (float , 可选) – 评估分布的值范围的上限(不包含),默认为所有输入时间序列中的最大值
- bins (int , 可选) – 用于分配数据点的值区间数量,默认为10
- 返回值: 一个函数,接受一个或多个序列作为输入,并在指定或默认数量的区间内生成所有数据点的分布。
- 返回类型: (Union[FunctionNode, NodeCollection]) -> SummarizerNode
数据框模式(Dataframe schema)¶
| 列名 | 类型 | 描述 |
|---|---|---|
| start_timestamp | datetime | 分布的起始时间(包含) |
| end_timestamp | datetime | 分布的结束时间(不包含) |
| start | float | 值的下限(包含) |
| end | float | 值的上限(不包含) |
| delta | float | 每个区间最小值和最大值之间的差值。根据区间的计算方式,所有区间的delta是固定的。 |
| distribution_values.start | float | 分布区间的起始值 |
| distribution_values.end | float | 分布区间的结束值 |
| distribution_values.count | int | 分布区间内的实例数量 |
:::callout{theme="success" title="另请参阅"}
statistics(), scatter()
:::
:::callout{theme="warning" title="注意"} 此函数仅适用于数值型序列。 :::
示例¶
>>> series_1 = F.points(
... (1, 0.0),
... (101, 10.2),
... (200, 11.3),
... (201, 11.1),
... (299, 11.2),
... (300, 12.0),
... (400, 11.7),
... (500, 16.0),
... (123450, 11.8),
... name="series-1",
... )
>>> series_2 = F.points(
... (1, 0.5),
... (101, 0.2),
... (200, 1.3),
... (201, 0.1),
... (299, 1.2),
... (300, 1.4),
... (400, 1.0),
... (500, 2.0),
... (123450, 1.0),
... name="series-2",
... )
>>> series_1.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000001 0.0
1 1970-01-01 00:00:00.000000101 10.2
2 1970-01-01 00:00:00.000000200 11.3
3 1970-01-01 00:00:00.000000201 11.1
4 1970-01-01 00:00:00.000000299 11.2
5 1970-01-01 00:00:00.000000300 12.0
6 1970-01-01 00:00:00.000000400 11.7
7 1970-01-01 00:00:00.000000500 16.0
8 1970-01-01 00:00:00.000123450 11.8
>>> series_2.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000001 0.5
1 1970-01-01 00:00:00.000000101 0.2
2 1970-01-01 00:00:00.000000200 1.3
3 1970-01-01 00:00:00.000000201 0.1
4 1970-01-01 00:00:00.000000299 1.2
5 1970-01-01 00:00:00.000000300 1.4
6 1970-01-01 00:00:00.000000400 1.0
7 1970-01-01 00:00:00.000000500 2.0
8 1970-01-01 00:00:00.000123450 1.0
>>> nc = NodeCollection(series_1, series_2)
>>> single_dist = F.distribution(bins=3)(series_1) # 单序列分布
>>> single_dist.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 5.333333 1 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
>>> multiple_dist = F.distribution(bins=3)(nc) # 多序列分布
>>> multiple_dist.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 5.333333 10 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216