文章目录

PostgreSQL的并行技术发展
- 并行相关参数
并行测试

PostgreSQL的并行技术发展

9.6 版本之前，还不支持并行查询，SQL无法利用多核CPU提升性能。

9.6 版本之前，虽然支持并行查询，但支持范围非常有限。

10 版本开始增强了并行查询、并行索引扫描、并行index-only扫描、并行bitmap heap扫描等。

并行相关参数

参数名	默认值	说明
max_worker_processes	8	系统支持的最大后台进程数，调整此参数重启后生效。。如果有备库，备库参数必须>=主库上此参数
max_parallel_workers	8	系统支持的并行查询进程数
max_parallel_workers_per_gather	2	允许启用的并行进程的进程数，设置0表示禁用。

以上三个参数的设置关系:

max_worker_processes > max_parallel_workers > max_parallel_workers_per_gather

参数名	默认值	说明
parallel_setup_cost	1000	优化器启动并行进程的成本
parallel_tuple_cost	0.1	优化器通过并行进程处理一行数据的成本
min_parallel_table_scan_size	8MB	开启并行条件之一，表占用空间小于此值将不会开启并行
min_parallel_index_scan_size	512KB	开启并行条件之一，实际上并行索引扫描不会扫描索引的所有数据块，只是扫描索引相关数据块
force_parallel_mode		强制开启并行，一般作为测试目的。

并行测试

创建测试表，无索引：

postgres=# create table tbig
postgres-# ( id int4 , 
postgres(#   name varchar(32), 
postgres(#   create_time timestamp without time zone default clock_timestamp());
CREATE TABLE


postgres=# insert into tbig (id , name )  select n , n||'_test'  from generate_series(1,500000) n;
INSERT 0 500000

并行顺序扫描

直接进行查询

postgres=# explain analyze  select * from tbig where name ='1_test';
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..6789.27 rows=1 width=23) (actual time=0.239..79.284 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbig  (cost=0.00..5789.17 rows=1 width=23) (actual time=42.197..67.470 rows=0 loops=3)
         Filter: ((name)::text = '1_test'::text)
         Rows Removed by Filter: 166666
 Planning Time: 0.076 ms
 Execution Time: 79.304 ms
(8 rows)

说明：

Workers Planned: 2 执行
计划
预估的并行数
Workers Launched: 2 执行
实际
预估的并行数
-> Parallel Seq Scan on tbig 表示进行了并行顺序扫描

并行索引扫描

我们在id列上创建了常用的btree索引：

postgres=#  create index ind_tbig_id on tbig using btree(id);
CREATE INDEX

并行索引扫描Index Scan

postgres=# explain analyze  select count(name) from tbig where id < 100000;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3780.44..3780.45 rows=1 width=8) (actual time=74.085..74.085 rows=1 loops=1)
   ->  Index Scan using ind_tbig_id on tbig  (cost=0.42..3527.82 rows=101051 width=11) (actual time=0.015..42.598 rows=99999 loops=1)
         Index Cond: (id < 100000)
 Planning Time: 0.088 ms
 Execution Time: 74.118 ms
(5 rows)

并行索引 Only Scan 扫描

postgres=# explain analyze  select count(*) from tbig where id < 100000;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3780.44..3780.45 rows=1 width=8) (actual time=88.136..88.137 rows=1 loops=1)
   ->  Index Only Scan using ind_tbig_id on tbig  (cost=0.42..3527.82 rows=101051 width=0) (actual time=0.028..56.892 rows=99999 loops=1)
         Index Cond: (id < 100000)
         Heap Fetches: 99999
 Planning Time: 0.136 ms
 Execution Time: 88.162 ms
(6 rows)

== Only 不 Only的唯一区别在于！！==

SQL是否仅根据索引就能获取到所需的数据
这样就不用通过索引回表获取数据了

并行Bitmap Heap扫描

这里要先说明两个概念：

Bitmap Index : 当SQL中的where条件出现or时，就有可能出现Bitmap Index扫描了。
Bitmap Heap：当出现两次Bitmap Index获取到索引项，需要将结果合起来回表，这是表上进行Bitmap Heap扫描

== 学习单词有助于理解数据库参数，啊哈哈 ==

在这里插入图片描述

例子如下（基本上是从下网上看的），可以看到两次Bitmap Index，然后Bitmap Heap合起来：

postgres=# explain analyze  select count(name) from tbig where id =1 or id = 2;

在这里插入图片描述

在Bitmap Heap合并的过程中，如果数据量很大的时候，也是可以使用到并行的 —- Parallel Bitmap Heap Scan

并行聚合

聚合 – 在这里就是我们常说的聚合函数的聚合。

也就是说当我们使用sum() 、count() 、avg() 等，这些聚合函数时，也是可以用到并行的.

postgres=# explain analyze  select sum(id) from tbig;

在这里插入图片描述

== 黄圈圈部分就是：在处理聚合函数，既然开启了并行，那么就是并行聚合 ==

多表关联

多表关联使用到并行并不是指多表关联本身使用了并行，而且处理数据检索是能够使用并行处理。常见的多表关联场景：

Nested loop 嵌套循环获取匹配数据
Merge loop 两个表先排序，在进行关联字段匹配
Hash loop 当两个表没有索引时，会进行Hash loop

效率比拼

Nested loop > Merge loop > Hash loop

原文链接：https://blog.csdn.net/strawberry1019/article/details/104778824