跳转至

Optimize indexes and schema design(优化索引与模式设计)

:::callout{theme="neutral"} The documentation on this page focuses on tuning indexes for optimal performance in Postgres and ElasticSearch. We generally recommend building Slate applications on top of Ontology capabilities, using features like Object Sets and Actions for reading and writing data. :::

A performant and maintainable application relies on well factored datasets and proper indexing.

Most aspects of schema design for Postgres are completely agnostic to Slate. These concepts are well-documented elsewhere and generic best practices for database design is outside the scope of this discussion. Google will rapidly take you down that rabbit hole if you're not already familiar with different schema patterns or guidance on choosing which columns to index.

Instead, we'll focus on some best practices that do relate to Slate and Foundry more specifically.

Push work as far upstream as possible

At every stage in the application development process, ask the question: “Where is the right place to do this work?” and always bias towards moving work as far upstream as possible. For instance, if you find yourself writing a complicated JavaScript function to aggregate your data or extract metrics, ask: “Can I do this instead in my query?”

If you're doing the same work in a query on every page load, for instance deriving a yearly total with a SUM() or creating a list with a DISTINCT(), ask yourself: “Can I do this in a derived dataset?”

In Foundry, distributed storage and compute are “cheaper” compared to work done in a query or in the users browser, so pre-compute as much as possible.

Postgate (Postgres)

Postgate wraps Postgres or RDS for straightforward read-only access to Foundry datasets through PostgreSQL queries. Remember that the limitations of your application queries are determined by the characteristics of relational databases (rather than Spark/HDFS in the rest of the platform) in general, and Postgres specifically.

As you develop queries to retrieve data, use EXPLAIN ANALYZE to profile your query and find bottlenecks. You can learn more about how Postgres interprets your query and how to interpret the results of an EXPLAIN ANALYZE request in this Thoughtbot blog post ↗. You can also read our more detailed explanation on optimizing Postgres queries for Slate.

You can build modular queries to prevent code re-use and better encapsulate shared logic using Partials. We recommend using partials and functions to trade off between keeping your queries streamlined and verbose enough to be readable.

Phonograph (ElasticSearch)

Phonograph provides a read and write datastore that uses ElasticSearch-style syntax for queries and aggregation, while also allowing CRUD operations on top of Foundry datasets. For much more on Phonograph practices, see the section Writeback Data from Slate to Foundry.


中文翻译


优化索引与模式设计

:::callout{theme="neutral"} 本文档主要介绍如何在 Postgres 和 ElasticSearch 中调整索引以实现最佳性能。我们通常建议基于 Ontology 功能构建 Slate 应用,使用 对象集操作 等特性来读写数据。 :::

一个高性能且易于维护的应用依赖于结构良好的数据集和合理的索引。

Postgres 的模式设计(schema design)大多与 Slate 无关。这些概念在其他地方已有详尽文档,而数据库设计的通用最佳实践不在本文讨论范围内。如果你不熟悉不同的模式设计模式或如何选择索引列,谷歌能迅速带你深入探索。

相反,我们将重点介绍一些与 Slate 和 Foundry 更相关的实践。

尽可能将工作推向上游

在应用开发的每个阶段,都要问自己:"这项工作应该在哪个环节完成?" 并始终倾向于将工作尽可能推向上游。例如,如果你发现自己正在编写复杂的 JavaScript 函数来聚合数据或提取指标,请思考:"我能否在查询中完成这项工作?"

如果你在每次页面加载时都在查询中执行相同的工作(例如使用 SUM() 计算年度总计,或使用 DISTINCT() 创建列表),请自问:"我能否在派生数据集(derived dataset)中完成这项工作?"

在 Foundry 中,分布式存储和计算比在查询或用户浏览器中执行工作更"廉价",因此请尽可能进行预计算。

Postgate (Postgres)

Postgate 封装了 Postgres 或 RDS,通过 PostgreSQL 查询提供对 Foundry 数据集的直接只读访问。请记住,应用查询的限制取决于关系型数据库(而非平台其他部分的 Spark/HDFS)的特性,尤其是 Postgres。

在开发数据检索查询时,请使用 EXPLAIN ANALYZE 分析查询并定位瓶颈。你可以通过这篇 Thoughtbot 博客文章 ↗ 了解 Postgres 如何解析查询以及如何解读 EXPLAIN ANALYZE 的结果。你也可以阅读我们关于 优化 Slate 的 Postgres 查询 的更详细说明。

你可以使用 部分查询 构建模块化查询,避免代码复用并更好地封装共享逻辑。我们建议使用部分查询和函数,在保持查询简洁与足够可读之间取得平衡。

Phonograph (ElasticSearch)

Phonograph 提供支持读写的数据存储,使用 ElasticSearch 风格的语法进行查询和聚合,同时允许在 Foundry 数据集上执行 CRUD 操作。有关 Phonograph 实践的更多内容,请参阅 从 Slate 向 Foundry 写回数据 章节。