FAQ(常见问题解答)¶
How do I set up a unit and interpolation for a time series property (TSP)?¶
You can configure units and interpolations for each TSP through a formatter. Find the formatter by navigating to a time series object type in Ontology Manager and locating the Time Series Properties section of the Capabilities tab. Alternatively, you can edit units and interpolations in the Properties tab, similar to any other value formatter.
What are Measures? When should they be used?¶
Prior to the development of sensor object types, the platform used Measures as a time series model. Measures are being deprecated and you can use sensor object types in similar workflows. For more information, see the following pages:
- Learn how to migrate to sensor object types from Measures.
- Learn how to store time series in the Ontology.
Can I set up time series using Code Repositories?¶
Review the Advanced setup documentation for more information.
Is time series data indexed?¶
Creating a time series sync on your time series dataset will automatically create a time series projection on the time series dataset. A time series projection is a materialized copy of the dataset that provides optimizations similar to those of a SQL database index.
When the time series sync builds, it generates metadata about the time series dataset transaction(s) from which it derives, informing Foundry's time series database of the data available to index.
When using time series, you read indexed data from the time series database. The time series database acts like a cache; data is only hydrated at read time, and the least recently hydrated series are evicted first once the database disk space is constrained.
What is the time series projection?¶
A projection is a materialized copy of a dataset that optimizes for certain queries. In the case of time series, the projection optimizes for the queries made when the time series dataset is read to hydrate data to the time series database. This process involves filtering the time series dataset to select the series IDs and time ranges that are being hydrated. In this way, the projection maintains good partitioning and a sort over the time series data, effectively indexing the time series dataset by series ID and timestamp.
Why is my time series failing to load?¶
If an error states that ‘no time series data exists’, your series IDs may not be mapping correctly between your time series dataset/sync and time series object type backing dataset. The set of series IDs in each dataset should intersect and, ideally, be equal sets for the time series properties to correctly reference time series data. Be sure to also check that the series ID property on your time series object type is correctly configured.
Particularly when dealing with large scale time series, it is possible that hydrating data to the time series database can fail outright. This may occur due to a failure in (or lack of) optimizations:
- If the time series projection on the time series dataset is outdated, unprojected transactions are read from the canonical dataset. This means the optimization of the projection is not applied; data partitions will be spread across many more files, and more rows would need to be scanned to hydrate the desired data. Built-in limits are configured to prevent this undesired access, as it is likely to result in your query timing out and have a negative effect on service health. Check that the time series projection’s schedule is running consistently and regularly. You can manually rebuild the projection from its dataset preview page.
- If the time series dataset is not correctly partitioned and sorted, this can, in extreme cases, lead to similar issues where too many rows must be scanned to index the desired data, and you will face built-in service limits. To help prevent this issue, the time series dataset will be correctly formatted for you when transforming data in Pipeline Builder and mapping it to a time series sync output. You can also resolve this issue with an updated time series projection or by manually adding the correct formatting to your pipeline.
Why is my time series taking a long time to load?¶
The most common reason for time series data loading slowly is that data was not already indexed in the time series database. Index hydration occurs the first time a certain time series (series ID) is queried or after any subsequent snapshot transactions are synced by the time series sync. A synced snapshot transaction informs the time series database to hydrate a series from the full dataset view to its index. There is a chance that a snapshot hydration will be triggered due to the queried time series data being evicted from the index; time series are evicted based on disk space requirements, with the least recently hydrated series being evicted first.
:::callout{theme="success"}
To improve hydration speed, add a time filter to your query that will only hydrate data points within the prescribed time range as opposed to the full series. You can create time filters using a filter time series card in Quiver or a time_range function in FoundryTS.
:::
After a time series is initially hydrated, queries should be much faster. If your pipeline is incrementally adding time series data, then the new data will be incrementally hydrated by the time series database and your time series should load quickly after the first snapshot hydration of the data.
:::callout{theme="warning"} We recommend running incremental pipelines to improve subsequent indexing performance. :::
If there is a lot of data to hydrate incrementally, a query can still take a long time to load. For example, load times will increase if an incremental transaction is very large, or if there are many incremental transactions that have not been hydrated due to the time series not being regularly queried.
In some extreme cases, both snapshot and incremental hydrations will be slow if the time series repartition or sort is not applied to the time series dataset, or too many partitions are written for the volume of data as this can require reading many files. This will only apply for transactions that have not yet been projected by the time series projection. When transforming data in Pipeline Builder and mapping it to a time series sync output, the time series dataset is correctly formatted for you.
Why is my time series missing data?¶
For all your time series data to be indexed in the time series database, its time series sync must be up-to-date. This means that the sync must have been built since the latest transactions on your time series dataset was built, otherwise the data from those transactions cannot be hydrated.
If your time series dataset is not stored in the Soho format, no unprojected data will be hydrated to the time series database. When transforming data in Pipeline Builder and mapping it to a time series sync output, the materialized time series dataset backing the sync is converted to the Soho format for you. You can also complete one of the following tasks to have more updated data available:
- Convert your time series dataset to the Soho format. This will require a snapshot to convert all the data.
- Schedule the time series projection to build on every update of the time series dataset. This will introduce some latency to queries for the latest data.
中文翻译¶
常见问题解答¶
如何为时间序列属性(TSP)设置单位和插值?¶
您可以通过格式化器(formatter)为每个TSP配置单位和插值。在Ontology Manager中导航到时间序列对象类型,找到功能选项卡中的时间序列属性部分即可找到格式化器。或者,您也可以在属性选项卡中编辑单位和插值,操作方式与其他值格式化器(value formatter)相同。
什么是度量(Measures)?何时应使用它们?¶
在传感器对象类型(sensor object types)开发之前,平台使用度量(Measures)作为时间序列模型。度量(Measures)正在被弃用,您可以在类似的工作流程中使用传感器对象类型。更多信息请参阅以下页面:
- 了解如何从度量(Measures)迁移到传感器对象类型。
- 了解如何在Ontology中存储时间序列。
能否使用代码仓库(Code Repositories)设置时间序列?¶
请查阅高级设置(Advanced setup)文档了解更多信息。
时间序列数据是否已建立索引?¶
在时间序列数据集(time series dataset)上创建时间序列同步(time series sync)将自动在该数据集上创建时间序列投影(time series projection)。时间序列投影是数据集的物化副本,提供类似于SQL数据库索引的优化功能。
当时间序列同步构建时,它会生成关于其所依据的时间序列数据集事务的元数据,告知Foundry时间序列数据库可供索引的数据。
当使用时间序列时,您从时间序列数据库读取已索引的数据。时间序列数据库类似于缓存;数据仅在读取时被加载,当数据库磁盘空间受限时,最近最少加载的序列将首先被驱逐。
什么是时间序列投影?¶
投影(projection)是数据集的物化副本,用于优化特定查询。对于时间序列而言,投影优化了读取时间序列数据集以将数据加载到时间序列数据库时的查询。此过程涉及过滤时间序列数据集,以选择正在加载的序列ID(series IDs)和时间范围。通过这种方式,投影保持了良好的分区和对时间序列数据的排序,有效地按序列ID和时间戳对时间序列数据集建立了索引。
为什么我的时间序列加载失败?¶
如果错误提示"不存在时间序列数据",则您的序列ID可能在时间序列数据集/同步与时间序列对象类型支持数据集(time series object type backing dataset)之间映射不正确。每个数据集中的序列ID集合应存在交集,理想情况下应为相等集合,以便时间序列属性(time series properties)正确引用时间序列数据。请务必检查时间序列对象类型上的序列ID属性是否已正确配置。
特别是在处理大规模时间序列时,将数据加载到时间序列数据库可能会直接失败。这可能是由于优化失败或缺乏优化导致的:
- 如果时间序列数据集上的时间序列投影已过期(outdated),则未投影的事务将从规范数据集(canonical dataset)中读取。这意味着投影的优化未生效;数据分区将分散在更多文件中,需要扫描更多行才能加载所需数据。内置限制已配置为防止这种不良访问,因为这很可能导致查询超时并对服务健康产生负面影响。请检查时间序列投影的调度是否持续且定期运行。您可以从其数据集预览页面手动重建投影。
- 如果时间序列数据集未正确分区和排序,在极端情况下可能导致类似问题,即必须扫描过多行才能索引所需数据,从而触发内置服务限制。为帮助防止此问题,当在Pipeline Builder中转换数据并将其映射到时间序列同步输出时,时间序列数据集将为您正确格式化。您也可以通过更新时间序列投影或手动向管道添加正确格式来解决此问题。
为什么我的时间序列加载时间很长?¶
时间序列数据加载缓慢的最常见原因是数据尚未在时间序列数据库中建立索引。索引加载发生在首次查询某个时间序列(序列ID)时,或在时间序列同步同步了任何后续快照事务(snapshot transactions)之后。同步的快照事务会通知时间序列数据库从完整数据集视图将序列加载到其索引中。如果查询的时间序列数据被从索引中驱逐,也可能触发快照加载;时间序列根据磁盘空间需求被驱逐,最近最少加载的序列将首先被驱逐。
:::callout{theme="success"}
为提高加载速度,请在查询中添加时间过滤器,仅加载指定时间范围内的数据点,而非整个序列。您可以使用Quiver中的过滤时间序列卡片或FoundryTS中的time_range函数创建时间过滤器。
:::
在时间序列首次加载后,查询速度应显著加快。如果您的管道是增量式(incrementally)添加时间序列数据,则新数据将由时间序列数据库增量加载,在首次快照加载数据后,时间序列应能快速加载。
:::callout{theme="warning"} 我们建议运行增量管道(incremental pipelines)以提高后续索引性能。 :::
如果需要增量加载的数据量很大,查询仍可能需要很长时间。例如,如果增量事务非常大,或者由于时间序列未定期查询而导致许多增量事务尚未加载,加载时间将会增加。
在某些极端情况下,如果时间序列数据集未应用时间序列重新分区或排序,或者为数据量写入了过多分区(因为需要读取大量文件),快照和增量加载都会变慢。这仅适用于尚未被时间序列投影投影的事务。当在Pipeline Builder中转换数据并将其映射到时间序列同步输出时,时间序列数据集将为您正确格式化。
为什么我的时间序列缺少数据?¶
要使所有时间序列数据在时间序列数据库中建立索引,其时间序列同步必须是最新的。这意味着同步必须在时间序列数据集的最新事务构建之后构建,否则这些事务中的数据无法被加载。
如果时间序列数据集未以Soho格式存储,则不会将未投影的数据加载到时间序列数据库。当在Pipeline Builder中转换数据并将其映射到时间序列同步输出时,支持同步的物化时间序列数据集将为您转换为Soho格式。您也可以完成以下任务之一以使更多更新数据可用:
- 将时间序列数据集转换为Soho格式。这需要一次快照来转换所有数据。
- 安排时间序列投影在时间序列数据集每次更新时构建。这会给最新数据的查询带来一些延迟。