foundryts.nodes.SummarizerNode¶
class foundryts.nodes.SummarizerNode(children)¶
Lazy query container for summarizing one or more FunctionNode to a final result.
A SummarizerNode is the final evaluated form of a raw or transformed time series and cannot be
transformed by FoundryTS any further. It is typical to evaluate a SummarizerNode to a dataframe
using either SummarizerNode.to_pandas() or SummarizerNode.to_dataframe() and performing
transformations and analysis using the respective dataframe libraries.
Examples¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 0.314159 1 0.314159 0.000000 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 0.314159 1 1.256636 0.942477 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 0.314159 1 3.141590 2.827431 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
columns()¶
Returns a tuple of strings representing the column names of the pandas.DataFrame
that would be produced by evaluating this node to a pandas dataframe.
:::callout{theme="warning" title="Note"}
Keys of nested objects will be flattened into a tuple with nested keys joined with ..
:::
- Returns: Tuple containing names of the columns in the resulting dataframe which the current node gets evaluated to.
- Return type: Tuple[str]
Examples¶
>>> node = foundryts.functions.series("series1")
>>> node.columns()
("timestamp", "value")
>>> stats_node = series.statistics(start=0, end=100, window_size=None)
>>> stats_node.columns()
("count", "smallest_point.timestamp", "start_timestamp", "latest_point.timestamp", "mean",
"earliest_point.timestamp", "largest_point.timestamp", "end_timestamp")
property series_ids¶
All series identifiers used by this node and its child nodes.
to_dataframe(fts=None)¶
Evaluates this node to a pyspark.sql.DataFrame.
PySpark DataFrames enable distributed data processing and parallelized transformations. They can be useful when
working with dataframes with a large number of rows, for example loading all the points in a raw series or the
result of a FunctionNode, or evaluating the results of multiple SummarizerNode or
FunctionNode together.
- Parameters: fts (foundryts.FoundryTS , optional) – FoundryTS session used to execute the query (a new session will be created if not provided).
- Returns: Output of the node evaluated to a PySpark dataframe.
- Return type: pyspark.sql.DataFrame
Examples¶
>>> series_node = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... )
>>> series_node.to_dataframe().show()
+-------------------------------+---------+
| timestamp | value |
+-------------------------------+---------+
| 1970-01-01 00:00:00.000000100 | 0.0 |
| 1970-01-01 00:00:00.000000200 | Infinity|
| 1970-01-01 00:00:00.000000300 | 3.14159 |
| 1970-01-01 00:00:02.147483647 | 1.0 |
+-------------------------------+---------+
to_dict(fts=None)¶
Evaluates this node to a nested dictionary with the results of this node.
If the SummarizerNode returns multiple results, this method returns a list of dictionaries, with each list item corresponding to a result.
- Parameters: fts (foundryts.FoundryTS , optional) – FoundryTS session used to execute the query (a new session will be created if not provided).
- Returns: Output of the node evaluated to a dictionary or a list of dictionaries.
- Return type: Union[Dict[str, Any], List[Dict[str, Any]]]
Examples¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
{
'start_timestamp': Timestamp('1677-09-21 00:12:43.145225216'),
'end_timestamp': Timestamp('2262-01-01 00:00:00'),
'start': 0.0,
'end': 3.1415900000000003,
'delta': 0.314159,
'distribution_values': [
{
'start': 0.0,
'end': 0.314159,
'count': 1
},
{
'start': 0.942477,
'end': 1.256636,
'count': 1
},
{
'start': 2.8274310000000002,
'end': 3.1415900000000003,
'count': 1
}
]
}
to_object(fts=None)¶
Evaluates this node to a Python object with the results of this node.
If the SummarizerNode returns multiple results, this method returns a list of objects, with each list item corresponding to a result.
- Parameters: fts (foundryts.FoundryTS , optional) – FoundryTS sessions used to execute the query (a new session will be created if not provided).
- Returns: Output of the node evaluated to a Python object or list of objects.
- Return type: Union[Any, List[Any]]
Examples¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_object()
_CuratedDistribution(start_timestamp=Timestamp(epoch_seconds=-9223372037, offset_nanos=145224193),
end_timestamp=Timestamp(epoch_seconds=9214646400, offset_nanos=0), start=0.0, end=3.1415900000000003,
delta=0.314159, distribution_values=[
DistributionDataPoint(start=0.0, end=0.314159, count=1),
DistributionDataPoint(start=0.942477, end=1.256636, count=1),
DistributionDataPoint(start=2.8274310000000002, end=3.1415900000000003, count=1)
]
)
to_pandas(fts=None)¶
Evaluates this node to a pandas.DataFrame.
The DataFrame result of a SummarizerNode will be the flattened into a
pandas.DataFrame with a single row (or one row per element if the summarizer returns multiple
results) and one column per leaf level value in the resulting object. Nested keys will be dot-separated.
- Parameters: fts (foundryts.FoundryTS , optional) – FoundryTS session used to execute the query (a new session will be created if not provided).
- Returns: Output of the node evaluated to a Pandas dataframe.
- Return type: pd.DataFrame
Examples¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 0.314159 1 0.314159 0.000000 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 0.314159 1 1.256636 0.942477 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 0.314159 1 3.141590 2.827431 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
types()¶
Returns a tuple of types for the columns of the pandas.DataFrame
that would be produced by evaluating this node to a pandas dataframe.
- Returns: Tuple containing types of the columns in the resulting dataframe which the current node gets evaluated to.
- Return type: Tuple[Type]
Examples¶
>>> node = foundryts.functions.points()
>>> node.types()
(<class 'int'>, <class 'float'>)
>>> stats_node = foundryts.functions.series("series1").statistics(start=0, end=100, window_size=None)
>>> stats_node.types()
(<class 'int'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>)
中文翻译¶
foundryts.nodes.SummarizerNode¶
class foundryts.nodes.SummarizerNode(children)¶
用于汇总一个或多个 FunctionNode 并生成最终结果的惰性查询容器(lazy query container)。
SummarizerNode 是原始或转换后时间序列的最终求值形式,无法再通过 FoundryTS 进行进一步转换。通常,使用 SummarizerNode.to_pandas() 或 SummarizerNode.to_dataframe() 将 SummarizerNode 求值为数据框(dataframe),然后使用相应的数据框库进行转换和分析。
示例¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 0.314159 1 0.314159 0.000000 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 0.314159 1 1.256636 0.942477 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 0.314159 1 3.141590 2.827431 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
columns()¶
返回一个字符串元组,表示将该节点求值为 pandas 数据框(pandas DataFrame)后生成的列名。
:::callout{theme="warning" title="注意"}
嵌套对象的键将被展平为元组,嵌套键之间用 . 连接。
:::
- 返回值: 包含当前节点求值结果数据框列名的元组。
- 返回类型: Tuple[str]
示例¶
>>> node = foundryts.functions.series("series1")
>>> node.columns()
("timestamp", "value")
>>> stats_node = series.statistics(start=0, end=100, window_size=None)
>>> stats_node.columns()
("count", "smallest_point.timestamp", "start_timestamp", "latest_point.timestamp", "mean",
"earliest_point.timestamp", "largest_point.timestamp", "end_timestamp")
property series_ids¶
此节点及其子节点使用的所有序列标识符(series identifiers)。
to_dataframe(fts=None)¶
将该节点求值为一个 pyspark.sql.DataFrame。
PySpark 数据框(PySpark DataFrames)支持分布式数据处理和并行化转换。当处理包含大量行的数据框时非常有用,例如加载原始序列中的所有数据点或 FunctionNode 的结果,或者同时求值多个 SummarizerNode 或 FunctionNode 的结果。
- 参数: fts (foundryts.FoundryTS , 可选) – 用于执行查询的 FoundryTS 会话(如果未提供,将创建新会话)。
- 返回值: 节点求值为 PySpark 数据框后的输出。
- 返回类型: pyspark.sql.DataFrame
示例¶
>>> series_node = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... )
>>> series_node.to_dataframe().show()
+-------------------------------+---------+
| timestamp | value |
+-------------------------------+---------+
| 1970-01-01 00:00:00.000000100 | 0.0 |
| 1970-01-01 00:00:00.000000200 | Infinity|
| 1970-01-01 00:00:00.000000300 | 3.14159 |
| 1970-01-01 00:00:02.147483647 | 1.0 |
+-------------------------------+---------+
to_dict(fts=None)¶
将该节点求值为一个包含节点结果的嵌套字典(nested dictionary)。
如果 SummarizerNode 返回多个结果,此方法将返回一个字典列表,每个列表项对应一个结果。
- 参数: fts (foundryts.FoundryTS , 可选) – 用于执行查询的 FoundryTS 会话(如果未提供,将创建新会话)。
- 返回值: 节点求值为字典或字典列表后的输出。
- 返回类型: Union[Dict[str, Any], List[Dict[str, Any]]]
示例¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
{
'start_timestamp': Timestamp('1677-09-21 00:12:43.145225216'),
'end_timestamp': Timestamp('2262-01-01 00:00:00'),
'start': 0.0,
'end': 3.1415900000000003,
'delta': 0.314159,
'distribution_values': [
{
'start': 0.0,
'end': 0.314159,
'count': 1
},
{
'start': 0.942477,
'end': 1.256636,
'count': 1
},
{
'start': 2.8274310000000002,
'end': 3.1415900000000003,
'count': 1
}
]
}
to_object(fts=None)¶
将该节点求值为一个包含节点结果的 Python 对象(Python object)。
如果 SummarizerNode 返回多个结果,此方法将返回一个对象列表,每个列表项对应一个结果。
- 参数: fts (foundryts.FoundryTS , 可选) – 用于执行查询的 FoundryTS 会话(如果未提供,将创建新会话)。
- 返回值: 节点求值为 Python 对象或对象列表后的输出。
- 返回类型: Union[Any, List[Any]]
示例¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_object()
_CuratedDistribution(start_timestamp=Timestamp(epoch_seconds=-9223372037, offset_nanos=145224193),
end_timestamp=Timestamp(epoch_seconds=9214646400, offset_nanos=0), start=0.0, end=3.1415900000000003,
delta=0.314159, distribution_values=[
DistributionDataPoint(start=0.0, end=0.314159, count=1),
DistributionDataPoint(start=0.942477, end=1.256636, count=1),
DistributionDataPoint(start=2.8274310000000002, end=3.1415900000000003, count=1)
]
)
to_pandas(fts=None)¶
将该节点求值为一个 pandas.DataFrame。
SummarizerNode 的数据框结果将被展平为一个 pandas.DataFrame,包含单行(如果汇总器返回多个结果,则每个元素对应一行)和结果对象中每个叶级值对应的一列。嵌套键将使用点号分隔。
- 参数: fts (foundryts.FoundryTS , 可选) – 用于执行查询的 FoundryTS 会话(如果未提供,将创建新会话)。
- 返回值: 节点求值为 Pandas 数据框后的输出。
- 返回类型: pd.DataFrame
示例¶
>>> series = F.points(
... (100, 0.0), (200, float("inf")), (300, 3.14159), (2147483647, 1.0), name="series"
... ) # FunctionNode
>>> series.to_pandas()
timestamp value
0 1970-01-01 00:00:00.000000100 0.00000
1 1970-01-01 00:00:00.000000200 inf
2 1970-01-01 00:00:00.000000300 3.14159
3 1970-01-01 00:00:02.147483647 1.00000
>>> dist_node = series.distribution() # SummarizerNode
>>> dist_node.to_pandas()
delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp
0 0.314159 1 0.314159 0.000000 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
1 0.314159 1 1.256636 0.942477 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
2 0.314159 1 3.141590 2.827431 3.14159 2262-01-01 0.0 1677-09-21 00:12:43.145225216
types()¶
返回一个类型元组,表示将该节点求值为 pandas 数据框(pandas DataFrame)后各列的类型。
- 返回值: 包含当前节点求值结果数据框列类型的元组。
- 返回类型: Tuple[Type]
示例¶
>>> node = foundryts.functions.points()
>>> node.types()
(<class 'int'>, <class 'float'>)
>>> stats_node = foundryts.functions.series("series1").statistics(start=0, end=100, window_size=None)
>>> stats_node.types()
(<class 'int'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>, <class 'pandas._libs.tslibs.timestamps.Timestamp'>)