PostgreSQLcluster大幅减少nestloop离散IO的优化方法

背景

对于较大数据量的表&＃xff0c;如果在索引字段上面有小结果集JOIN&＃xff0c;用nestloop JOIN是比较好的方法。

但是nestloop带来的一个问题就是离散IO&＃xff0c;这个是无法回避的问题&＃xff0c;特别是硬件IO能力不行的情况下&＃xff0c;性能会比较糟糕。

有什么优化方法呢&＃xff1f;

PostgreSQL提供了一个命令&＃xff0c;可以修改物理存储的顺序&＃xff0c;减少离散IO就靠它了。

例子

创建两张表

postgres&＃61;# create unlogged table test01(id int primary key, info text); CREATE TABLE postgres&＃61;# create unlogged table test02(id int primary key, info text); CREATE TABLE

产生一些离散primary key数据

postgres&＃61;# insert into test01 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test01_pkey do nothing; INSERT 0 6322422postgres&＃61;# insert into test02 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test02_pkey do nothing; INSERT 0 6320836

分析表

postgres&＃61;# analyze test01; postgres&＃61;# analyze test02;

清除缓存&＃xff0c;并重启

$ pg_ctl stop -m fast # echo 3 > /proc/sys/vm/drop_caches $ pg_ctl start

第一次调用&＃xff0c;耗费大量的离散IO&＃xff0c;执行时间18.490毫秒&＃xff08;我这台机器是SSD&＃xff0c;IOPS能力算好的&＃xff0c;差的机器时间更长&＃xff09;

postgres&＃61;# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id&＃61;t2.id and t1.id between 1 and 1000;QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost&＃61;19.25..7532.97 rows&＃61;623 width&＃61;74) (actual time&＃61;0.465..17.221 rows&＃61;402 loops&＃61;1)Output: t1.id, t1.info, t2.id, t2.infoBuffers: shared hit&＃61;1929 read&＃61;1039 dirtied&＃61;188-> Bitmap Heap Scan on public.test01 t1 (cost&＃61;18.82..2306.39 rows&＃61;623 width&＃61;37) (actual time&＃61;0.416..8.019 rows&＃61;640 loops&＃61;1)Output: t1.id, t1.infoRecheck Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Heap Blocks: exact&＃61;637Buffers: shared hit&＃61;5 read&＃61;637 dirtied&＃61;123-> Bitmap Index Scan on test01_pkey (cost&＃61;0.00..18.66 rows&＃61;623 width&＃61;0) (actual time&＃61;0.254..0.254 rows&＃61;640 loops&＃61;1)Index Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Buffers: shared hit&＃61;4 read&＃61;1-> Index Scan using test02_pkey on public.test02 t2 (cost&＃61;0.43..8.38 rows&＃61;1 width&＃61;37) (actual time&＃61;0.013..0.013 rows&＃61;1 loops&＃61;640)Output: t2.id, t2.infoIndex Cond: (t2.id &＃61; t1.id)Buffers: shared hit&＃61;1924 read&＃61;402 dirtied&＃61;65Planning time: 26.668 msExecution time: 18.490 ms (17 rows)

第二次&＃xff0c;缓存命中5.4毫秒

postgres&＃61;# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id&＃61;t2.id and t1.id between 1 and 1000;QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost&＃61;19.25..7532.97 rows&＃61;623 width&＃61;74) (actual time&＃61;0.392..5.150 rows&＃61;402 loops&＃61;1)Output: t1.id, t1.info, t2.id, t2.infoBuffers: shared hit&＃61;2968-> Bitmap Heap Scan on public.test01 t1 (cost&＃61;18.82..2306.39 rows&＃61;623 width&＃61;37) (actual time&＃61;0.373..1.760 rows&＃61;640 loops&＃61;1)Output: t1.id, t1.infoRecheck Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Heap Blocks: exact&＃61;637Buffers: shared hit&＃61;642-> Bitmap Index Scan on test01_pkey (cost&＃61;0.00..18.66 rows&＃61;623 width&＃61;0) (actual time&＃61;0.218..0.218 rows&＃61;640 loops&＃61;1)Index Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Buffers: shared hit&＃61;5-> Index Scan using test02_pkey on public.test02 t2 (cost&＃61;0.43..8.38 rows&＃61;1 width&＃61;37) (actual time&＃61;0.004..0.004 rows&＃61;1 loops&＃61;640)Output: t2.id, t2.infoIndex Cond: (t2.id &＃61; t1.id)Buffers: shared hit&＃61;2326Planning time: 0.956 msExecution time: 5.434 ms (17 rows)

根据索引字段调整表的物理顺序&＃xff0c;降低离散IO。

postgres&＃61;# cluster test01 using test01_pkey; CLUSTER postgres&＃61;# cluster test02 using test02_pkey; CLUSTER postgres&＃61;# analyze test01; postgres&＃61;# analyze test02;

清除缓存&＃xff0c;重启数据库

$ pg_ctl stop -m fast # echo 3 > /proc/sys/vm/drop_caches $ pg_ctl start

第一次调用&＃xff0c;降低到了5.4毫秒

postgres&＃61;# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id&＃61;t2.id and t1.id between 1 and 1000;QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost&＃61;0.86..5618.07 rows&＃61;668 width&＃61;74) (actual time&＃61;0.069..4.072 rows&＃61;402 loops&＃61;1)Output: t1.id, t1.info, t2.id, t2.infoBuffers: shared hit&＃61;2323 read&＃61;12-> Index Scan using test01_pkey on public.test01 t1 (cost&＃61;0.43..30.79 rows&＃61;668 width&＃61;37) (actual time&＃61;0.040..0.557 rows&＃61;640 loops&＃61;1)Output: t1.id, t1.infoIndex Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Buffers: shared hit&＃61;5 read&＃61;6-> Index Scan using test02_pkey on public.test02 t2 (cost&＃61;0.43..8.35 rows&＃61;1 width&＃61;37) (actual time&＃61;0.004..0.004 rows&＃61;1 loops&＃61;640)Output: t2.id, t2.infoIndex Cond: (t2.id &＃61; t1.id)Buffers: shared hit&＃61;2318 read&＃61;6 -- 注意在cluster之后&＃xff0c;shared hit并没有下降&＃xff0c;因为LOOP了多次&＃xff0c;但是性能确比cluster 之前提升了很多&＃xff0c;因为需要访问的HEAP page少了&＃xff0c;OS cache可以瞬间命中。 Planning time: 42.356 msExecution time: 5.426 ms (13 rows)

第二次调用&＃xff0c;3.6毫秒

postgres&＃61;# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id&＃61;t2.id and t1.id between 1 and 1000;QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost&＃61;0.86..5618.07 rows&＃61;668 width&＃61;74) (actual time&＃61;0.055..3.414 rows&＃61;402 loops&＃61;1)Output: t1.id, t1.info, t2.id, t2.infoBuffers: shared hit&＃61;2335-> Index Scan using test01_pkey on public.test01 t1 (cost&＃61;0.43..30.79 rows&＃61;668 width&＃61;37) (actual time&＃61;0.037..0.374 rows&＃61;640 loops&＃61;1)Output: t1.id, t1.infoIndex Cond: ((t1.id >&＃61; 1) AND (t1.id <&＃61; 1000))Buffers: shared hit&＃61;11-> Index Scan using test02_pkey on public.test02 t2 (cost&＃61;0.43..8.35 rows&＃61;1 width&＃61;37) (actual time&＃61;0.003..0.004 rows&＃61;1 loops&＃61;640)Output: t2.id, t2.infoIndex Cond: (t2.id &＃61; t1.id)Buffers: shared hit&＃61;2324Planning time: 1.042 msExecution time: 3.620 ms (13 rows)

小结

通过cluster, 将表的物理顺序和索引对齐&＃xff0c;所以如果查询的值是连续的&＃xff0c;在使用嵌套循环时可以大幅减少离散IO&＃xff0c;取得非常好查询优化的效果。

如果查询的值是跳跃的&＃xff0c;那么这种方法就没有效果啦&＃xff0c;不过好在PostgreSQL有bitmap index scan&＃xff0c;在读取heap tuple前&＃xff0c;会对ctid排序&＃xff0c;按排序后的ctid取heap tuple&＃xff0c;也可以起到减少离散IO的作用。