Optimize queries in Postgres（优化 Postgres 中的查询）¶

:::callout{theme="neutral"} This page focuses on tuning queries for optimal performance in Postgres. We generally recommend building Slate applications on top of Ontology capabilities, using features like object sets and Actions for reading and writing data. :::

Designing a responsive application requires careful planning and consideration. The data model, query structure, and dependency graph each play an important role in application performance and usability. This guide is focused on performance tuning aspects of PostgreSQL queries that can be written in Slate and includes performance heuristics, methods for identifying tuning opportunities and suggestions for improving performance.

Query performance tuning is an iterative process. There are many different ways to write the same request in order to produce the "correct" answer.

Performance Heuristics¶

The target execution time for PostgreSQL queries is <= 500ms. This should be possible for the majority of use cases where the dataset in question is less than 10m records, though it requires careful planning and consideration when designing the data model.

When evaluating query performance, particularly with a new dataset, ensure that queries are run multiple times before collecting and analyzing statistics. This will help ensure the dataset is properly cached and the performance is better aligned with the expected usage. If you're interested in learning more about the PostgreSQL cache, see the following links:

EXPLAIN... explained¶

EXPLAIN is particularly useful command that Postgres provides to return the query execution plan. A query plan is created for every request that Postgres receives, which uses the query structure and properties of the data to determine the fastest way to service the request. We'll start with a quick review of the EXPLAIN command, as this will be referenced throughout the guide.

EXPLAIN¶

NODES¶

You can think of a node as a logical unit of work, or a step in the query evaluation. The nodes are returned as an inverted graph, which means the first line of the response is the last unit of work that is performed. Each node is preceded by ->.

-> Index Scan using event_type_idx on event (...)

Why this matters: The nodes will be used to identify the inefficient operations in a query execution plan and help prioritize performance tuning efforts.

COST¶

The first number is the start-up cost (time to retrieve the first record). The second number is the cost to process the entire node from start up to completion.

(**cost=86.83..4577.07** rows=2368 width=10)

Cost is an estimate that the Postgres query planner generates based on object (generally table) statistics. While this number does not represent the actual runtime, it should be directly correlated to the actual execution.

Cost is a combination of several work components: sequential fetch, non-sequential (random) fetch, processing of row, processing operator (function), and processing index entry. The cost represents I/O and CPU activity; the larger the number, the more work Postgres thinks it will need to do in order to complete the task. It's important to note that the Postgres query optimizer determines which execution plan to use based on the cost.

ROWS¶

Estimated number of rows that will be output by this plan node.

(cost=86.83..4577.07 **rows=2368** width=10)

Why this matters: ROWS can be used to identify nodes that are returning large volumes of data and/or not behaving as expected.

WIDTH¶

Estimated average size (in bytes) of rows output by the node.

(cost=86.83..4577.07 rows=2368 **width=10**)

Why this matters: WIDTH can be used to identify nodes that output rows with very large properties, or a large number of columns.

EXPLAIN ANALYZE¶

NODES¶

See above.

ACTUAL TIME¶

Similar to Cost, the first number is the actual time in milliseconds (ms) needed for start-up. The second number is the actual time to process the entire node from start up to completion.

(**actual time=10.313..12.530** rows=4857 loops=1)

As the name implies, actual time is captured by executing the statement. The keyword ANALYZE tells Postgres to execute the query along with displaying the execution plan. If you're having trouble with a query timing out, removing ANALYZE will return just the query plan, which should be significantly faster than executing the query.

Why this matters: This is the clearest indicator of which node(s) or operations are causing the performance issues.

ROWS¶

Estimated number of rows that output by the node.

(actual time=10.313..12.530 **rows=4857** loops=1)

Why this matters: ROWS can help provide context as to why a particular operation might be taking longer than expected.

LOOPS¶

Reports the total number of executions of the node. The actual time and row values shown are averages per execution. Multiply the LOOPS value by the actual time to get the total time spent in the node.

(actual time=10.313..12.530 rows=150 **loops=10**)

Understanding operations (plan nodes)¶

Scans¶

sequential scans (seq scan): The Seq Scan operation scans the entire relation (table) as stored on disk (like TABLE ACCESS FULL). It is always possible to perform a seq scans on a relation; regardless of the relation schema, size, constraints, and existence of index(es).
The following are characteristics of a seq scan:
- Fast to start up (sequential I/O is much faster than random access).
- Each block is read only once.
- Produces unordered output.
index scans: The Index Scan performs a B-tree traversal, walks through the leaf nodes to find all matching entries, and fetches the corresponding table data. It is like an INDEX RANGE SCAN followed by a TABLE ACCESS BY INDEX ROWID operation
The following are characteristics of an Index Scan:
- Random access is much slower than sequential I/O.
- Requires additional I/O to access index.
- Potentially reads the same block multiple times.
- Produces ordered output.
bitmap index/heap scans: A plain Index Scan fetches one tuple-pointer at a time from the index, and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order.
The following are characteristics of a bitmap index/heap scan:
- Sequential I/O with index selectivity.
- Slow to start up, as all index tuples are read and sorted.
- Often selected for IN and =ANY(array) operators, as well as low selectivity index scans.
- Can combine multiple indexes.
- Produces unordered output.
index only: The Index Only Scan performs a B-tree traversal and walks through the leaf nodes to find all matching entries. There is no table access needed because the index has all columns to satisfy the query (with the exception of MVCC visibility information).

Joins¶

Join operations typically process only two tables at a time. When a query involves joining more than two tables, the joins are executed sequentially: first with two tables, then the intermediate result with the next table, and so on. In the context of joins, the term “table” could therefore also mean “intermediate result”.

nested loop: Joins two tables by fetching the result from one table and querying the other table for each row from the first.
Generally the least performant form of join.
Fast to produce first record.
Negative performance possible if the second child is slow.
Only join capable of executing CROSS JOIN.
Only join capable of inequality join conditions.
merge join: The (sort) merge join combines two sorted lists like a zipper. Both sides of the join must be presorted.
Can only be used for equality join conditions.
Generally the most performant for large data sets.
Requires ordered inputs - which can require slow sorts or index scans.
Slow to start up, as all index tuples are read and sorted.
hash joins: The hash join loads the candidate records from one side of the join into a hash table (marked with Hash in the plan) which is then probed for each record from the other side of the join.
Can only be used for equality join conditions.
Generally the most performant for joining a large table against a small table.
Only for hashable data types.
Slow start due to hashing the smaller table.
Performance is negatively impacted if table stats out of date and incorrect.

Aggregates¶

GroupAggregate: Aggregates a presorted set according to the group by clause. This operation does not buffer large amounts of data.
HashAggregate: Uses a temporary hash table to group records. The HashAggregate operation does not require a presorted data set; instead, it uses large amounts of memory to materialize the intermediate result (not pipelined). The output is not ordered in any meaningful way.
Unique
WindowAgg: Indicates the use of window functions.

Miscellaneous¶

Sort
Occurs with ORDER BY, DISTINCT, GROUP BY, UNION, and merge joins.
Considerable startup time.
If sort fits in work_mem, then quicksort can be used.
If sort doesn't fit into memory, it will spill to disk and use temporary files, which can be very expensive.
Limit
Handles both LIMIT and OFFSET.
Can be used for min() and max() if theres no WHERE clause.
Records skipped for OFFSET are still generated/materialized but are discarded before results are returned.
The cost of child scan is still the full cost.
Sort combined with limit can use an optimized form of sort.

Query tuning best practices¶

SELECT¶

Include only the attributes that are needed for display, which will help limit the pages fetched to support the request.

DISTINCT¶

DISTINCT comes from an older part of the PostgreSQL code base and uses a less efficient method for identifying distinct records.
If possible, avoid the use of DISTINCT by using GROUP BY or a subquery.

JOIN¶

When joining tables, try to use a simple equality statement in the ON clause (such as a.id = b.person_id). Doing so allows more efficient join techniques to be used (such as Hash Join rather than Nested Loop Join).
Convert subqueries to JOIN statements when possible as this usually allows the optimizer to understand the intent and possibly chose a better plan.
Use JOINs properly: Are you using GROUP BY or DISTINCT just because you are getting duplicate results? This usually indicates improper JOIN usage and may result in a higher costs.
If the execution plan is using a Hash Join it can be very slow if table size estimates are wrong. Therefore, make sure your table statistics are accurate by reviewing your vacuuming strategy.
Avoid correlated subqueries where possible; they can significantly increase query cost.
Use EXISTS when checking for existence of rows based on criterion because it “short-circuits” (stops processing when it finds at least one match).

WHERE clause¶

Avoid LIKE if possible.
Avoid passing large lists into an IN() statements - instead, consider using a JOIN condition or adjusting logic to be exclusion-based.
Avoid function calls in the WHERE clause.

GROUP BY & GROUPING SET¶

See PostgreSQL GROUPING SETS documentation ↗.

UNION vs UNION ALL¶

UNION will eliminate duplicate records; this requires sorting the tables in question.

We strongly suggest avoiding the use of UNION given the cost associated with the request. There should be very few instances where this is required and, if so, it should be done in the Transforms layer before querying from Slate.

UNION ALL will not eliminate duplicates and will efficiently append the rows of one table to another.

See Common Mistakes: UNION VS. UNION ALL ↗.

Indexes¶

Considering adding indexes under the following conditions:

Eliminate sequential scans (seq scan), unless it's a small table and/or the query is fetching more than 5% of the rows.
If using a multi-column index, pay attention to the order in which you define the included columns.
Use indexes that are highly selective on frequently-used columns.

Identify missing indexes¶

SELECT
  relname,
  seq_scan - idx_scan AS too_much_seq,
  CASE
    WHEN seq_scan - coalesce(idx_scan, 0) > 0 THEN 'missing index?'
    ELSE 'OK'
  END,
  pg_relation_size(relname::regclass) AS rel_size, seq_scan, idx_scan
FROM pg_stat_all_tablesWHERE schemaname = 'public'AND pg_relation_size(relname::regclass) > 80000
ORDER BY too_much_seq DESC;

Identify unused indexes¶

SELECT
  indexrelid::regclass as index,
  relid::regclass as table,
  'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0AND indisunique is false;

Index-only scans (covering index)¶

Read more about the proper use and benefits of index-only scans ↗.

中文翻译¶

优化 Postgres 中的查询¶

:::callout{theme="neutral"} 本页重点介绍如何在 Postgres 中调整查询以获得最佳性能。我们通常建议在 Ontology 功能之上构建 Slate 应用程序，使用对象集和操作等功能来读取和写入数据。 :::

设计一个响应式应用程序需要仔细的规划和考量。数据模型、查询结构和依赖关系图在应用程序性能和可用性中都扮演着重要角色。本指南专注于可在 Slate 中编写的 PostgreSQL 查询的性能调优方面，包括性能启发式方法、识别调优机会的方法以及改进性能的建议。

查询性能调优是一个迭代过程。有许多不同的方式可以编写同一个请求以产生"正确"的答案。

性能启发式方法¶

PostgreSQL 查询的目标执行时间 <= 500ms。对于大多数数据集少于 1000 万条记录的使用场景，这应该是可以实现的，尽管在设计数据模型时需要仔细规划和考量。

在评估查询性能时，特别是对于新数据集，请确保在收集和分析统计数据之前多次运行查询。这将有助于确保数据集被正确缓存，并使性能更符合预期使用情况。如果您想了解更多关于 PostgreSQL 缓存的信息，请参阅以下链接：

EXPLAIN... 解释¶

EXPLAIN 是 Postgres 提供的一个特别有用的命令，用于返回查询执行计划。Postgres 接收到的每个请求都会创建一个查询计划，该计划使用查询结构和数据属性来确定服务该请求的最快方式。我们将从快速回顾 EXPLAIN 命令开始，因为本指南中会多次引用它。

EXPLAIN¶

节点 (NODES)¶

您可以将节点视为一个逻辑工作单元，或查询评估中的一个步骤。节点以倒置图的形式返回，这意味着响应的第一行是执行的最后一个工作单元。每个节点前面都有 ->。

-> Index Scan using event_type_idx on event (...)

为什么这很重要：节点将用于识别查询执行计划中的低效操作，并帮助确定性能调优工作的优先级。

成本 (COST)¶

第一个数字是启动成本（检索第一条记录的时间）。第二个数字是从启动到完成处理整个节点的成本。

(**cost=86.83..4577.07** rows=2368 width=10)

成本是 Postgres 查询规划器基于对象（通常是表）统计信息生成的估计值。虽然这个数字不代表实际运行时间，但它应该与实际执行直接相关。

成本是几个工作组件的组合：顺序读取、非顺序（随机）读取、行处理、操作符（函数）处理和索引条目处理。成本代表 I/O 和 CPU 活动；数字越大，Postgres 认为完成任务所需的工作就越多。需要注意的是，Postgres 查询优化器会根据成本来决定使用哪个执行计划。

行数 (ROWS)¶

此计划节点将输出的估计行数。

(cost=86.83..4577.07 **rows=2368** width=10)

为什么这很重要：ROWS 可用于识别返回大量数据和/或行为不符合预期的节点。

宽度 (WIDTH)¶

节点输出的行的估计平均大小（以字节为单位）。

(cost=86.83..4577.07 rows=2368 **width=10**)

为什么这很重要：WIDTH 可用于识别输出具有非常大属性或大量列的行的节点。

EXPLAIN ANALYZE¶

节点 (NODES)¶

见上文。

实际时间 (ACTUAL TIME)¶

类似于成本，第一个数字是启动所需的实际时间（以毫秒为单位）。第二个数字是从启动到完成处理整个节点的实际时间。

(**actual time=10.313..12.530** rows=4857 loops=1)

顾名思义，实际时间是通过执行语句来捕获的。关键字 ANALYZE 告诉 Postgres 在执行查询的同时显示执行计划。如果您遇到查询超时的问题，移除 ANALYZE 将只返回查询计划，这应该比执行查询快得多。

为什么这很重要：这是最清晰的指标，表明哪个节点或操作导致了性能问题。

行数 (ROWS)¶

节点输出的估计行数。

(actual time=10.313..12.530 **rows=4857** loops=1)

为什么这很重要：ROWS 可以帮助提供上下文，说明为什么某个特定操作可能比预期花费更长时间。

循环次数 (LOOPS)¶

报告节点的总执行次数。显示的实际时间和行值是每次执行的平均值。将 LOOPS 值乘以实际时间即可获得在该节点上花费的总时间。

(actual time=10.313..12.530 rows=150 **loops=10**)

理解操作（计划节点）¶

扫描 (Scans)¶

顺序扫描 (sequential scans) (seq scan)：Seq Scan 操作扫描存储在磁盘上的整个关系（表）（类似于 TABLE ACCESS FULL）。对关系执行顺序扫描总是可能的；无论关系的模式、大小、约束和索引是否存在。
顺序扫描的特点如下：
- 启动快（顺序 I/O 比随机访问快得多）。
- 每个块只读取一次。
- 产生无序输出。
索引扫描 (index scans)：Index Scan 执行 B 树遍历，遍历叶节点以查找所有匹配条目，并获取相应的表数据。它类似于 INDEX RANGE SCAN 后跟 TABLE ACCESS BY INDEX ROWID 操作。
索引扫描的特点如下：
- 随机访问比顺序 I/O 慢得多。
- 需要额外的 I/O 来访问索引。
- 可能多次读取同一个块。
- 产生有序输出。
位图索引/堆扫描 (bitmap index/heap scans)：普通的 Index Scan 一次从索引中获取一个元组指针，并立即访问表中的该元组。位图扫描一次性从索引中获取所有元组指针，使用内存中的"位图"数据结构对其进行排序，然后按物理元组位置顺序访问表元组。
位图索引/堆扫描的特点如下：
- 具有索引选择性的顺序 I/O。
- 启动慢，因为所有索引元组都被读取和排序。
- 通常为 IN 和 =ANY(array) 运算符以及低选择性索引扫描选择。
- 可以组合多个索引。
- 产生无序输出。
仅索引扫描 (index only)：Index Only Scan 执行 B 树遍历并遍历叶节点以查找所有匹配条目。不需要访问表，因为索引拥有满足查询所需的所有列（MVCC 可见性信息除外）。

连接 (Joins)¶

连接操作通常一次只处理两个表。当查询涉及连接两个以上的表时，连接会顺序执行：首先连接两个表，然后将中间结果与下一个表连接，依此类推。在连接的上下文中，"表"一词也可能指"中间结果"。

嵌套循环连接 (nested loop)：通过从一个表获取结果并为第一个表中的每一行查询另一个表来连接两个表。
通常是最低效的连接形式。
快速产生第一条记录。
如果第二个子节点慢，则可能出现负面性能。
唯一能够执行 CROSS JOIN 的连接。
唯一能够执行不等值连接条件的连接。
合并连接 (merge join)：（排序）合并连接像拉链一样组合两个已排序的列表。连接的两侧都必须预先排序。
只能用于等值连接条件。
通常对于大数据集性能最高。
需要有序输入——这可能需要慢速排序或索引扫描。
启动慢，因为所有索引元组都被读取和排序。
哈希连接 (hash joins)：哈希连接将连接一侧的候选记录加载到哈希表（在计划中标记为 Hash）中，然后为连接另一侧的每条记录探测该哈希表。
只能用于等值连接条件。
通常对于将大表与小表连接性能最高。
仅适用于可哈希的数据类型。
由于对较小的表进行哈希处理，启动较慢。
如果表统计信息过时且不正确，性能会受到负面影响。

聚合 (Aggregates)¶

GroupAggregate：根据 group by 子句聚合预排序的集合。此操作不会缓冲大量数据。
HashAggregate：使用临时哈希表对记录进行分组。HashAggregate 操作不需要预排序的数据集；相反，它使用大量内存来物化中间结果（非流水线）。输出没有任何有意义的顺序。
Unique
WindowAgg：表示使用了窗口函数。

其他 (Miscellaneous)¶

Sort
出现在 ORDER BY、DISTINCT、GROUP BY、UNION 和合并连接中。
启动时间相当长。
如果排序适合 work_mem，则可以使用快速排序。
如果排序不适合内存，则会溢出到磁盘并使用临时文件，这可能非常昂贵。
Limit
处理 LIMIT 和 OFFSET。
如果没有 WHERE 子句，可用于 min() 和 max()。
为 OFFSET 跳过的记录仍然会生成/物化，但在返回结果之前会被丢弃。
子扫描的成本仍然是全部成本。
与 limit 结合的 Sort 可以使用优化的排序形式。

查询调优最佳实践¶

SELECT¶

仅包含显示所需的属性，这将有助于限制为支持请求而获取的页面。

DISTINCT¶

DISTINCT 来自 PostgreSQL 代码库中较旧的部分，使用效率较低的方法来识别不同的记录。
如果可能，通过使用 GROUP BY 或子查询来避免使用 DISTINCT。

JOIN¶

连接表时，尽量在 ON 子句中使用简单的等值语句（例如 a.id = b.person_id）。这样可以允许使用更高效的连接技术（例如哈希连接而不是嵌套循环连接）。
尽可能将子查询转换为 JOIN 语句，因为这通常允许优化器理解意图并可能选择更好的计划。
正确使用 JOIN：您是否仅仅因为得到重复结果而使用 GROUP BY 或 DISTINCT？这通常表明 JOIN 使用不当，并可能导致更高的成本。
如果执行计划使用哈希连接，当表大小估计错误时，它可能会非常慢。因此，通过检查您的清理策略来确保表统计信息准确。
尽可能避免相关子查询；它们会显著增加查询成本。
在根据条件检查行是否存在时使用 EXISTS，因为它会"短路"（在找到至少一个匹配项时停止处理）。

WHERE 子句¶

如果可能，避免使用 LIKE。
避免将大型列表传递给 IN() 语句——相反，考虑使用 JOIN 条件或调整逻辑为基于排除。
避免在 WHERE 子句中使用函数调用。

GROUP BY 和 GROUPING SET¶

请参阅 PostgreSQL GROUPING SETS 文档 ↗。

UNION 与 UNION ALL¶

UNION 会消除重复记录；这需要对相关表进行排序。

鉴于与请求相关的成本，我们强烈建议避免使用 UNION。很少有情况需要这样做，如果需要，应该在从 Slate 查询之前在 Transforms 层完成。

UNION ALL 不会消除重复项，并且会高效地将一个表的行追加到另一个表。

请参阅常见错误：UNION VS. UNION ALL ↗。

索引¶

考虑在以下条件下添加索引：

消除顺序扫描（seq scan），除非是小表和/或查询获取超过 5% 的行。
如果使用多列索引，请注意定义包含列的顺序。
在频繁使用的列上使用高选择性的索引。

识别缺失的索引¶

SELECT
  relname,
  seq_scan - idx_scan AS too_much_seq,
  CASE
    WHEN seq_scan - coalesce(idx_scan, 0) > 0 THEN 'missing index?'
    ELSE 'OK'
  END,
  pg_relation_size(relname::regclass) AS rel_size, seq_scan, idx_scan
FROM pg_stat_all_tablesWHERE schemaname = 'public'AND pg_relation_size(relname::regclass) > 80000
ORDER BY too_much_seq DESC;

识别未使用的索引¶

SELECT
  indexrelid::regclass as index,
  relid::regclass as table,
  'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0AND indisunique is false;

仅索引扫描（覆盖索引）¶

阅读更多关于仅索引扫描 ↗的正确使用和好处。