作者:记者刘方斌_304 | 来源:互联网 | 2023-07-17 20:41
postgresql大表join优化一、背景1、数据量:表名数据量f_invoice87346130f_invoice_item975358672、索引:表:f_invoice_i
postgresql大表join优化
一、背景
1、数据量:
表名 | 数据量 |
---|
f_invoice | 87346130 |
f_invoice_item | 97535867 |
2、索引:
表:f_invoice_item
CREATE INDEX f_invoice_item_order_item_id_idx ON ins_dw_prd12.f_invoice_item USING btree (order_item_id)
CREATE INDEX f_invoice_item_invoice_id_idx ON ins_dw_prd12.f_invoice_item USING btree (invoice_id) WITH (fillfactor='100')
表:f_invoice
CREATE INDEX idx_f_invoice_gin ON ins_dw_prd12.f_invoice USING gin (source_type, invoice_type, invoice_status, invoice_title, invoice_date, seller_taxer_code, shop_id, create_time)
CREATE INDEX idx_f_invoice_invoice_date ON ins_dw_prd12.f_invoice USING btree (invoice_date) WITH (fillfactor='100')
CREATE INDEX idx_f_invoice_seller_taxer_code ON ins_dw_prd12.f_invoice USING btree (seller_taxer_code) WITH (fillfactor='100')
CREATE INDEX idx_invoice_createtime_btree ON ins_dw_prd12.f_invoice USING btree (create_time) WITH (fillfactor='100')
二 、优化前
sql:
explain(analyse, timing)
SELECT count(*)
from (SELECT fi.invoice_id
FROM ins_dw_prd12.f_invoice fi
WHERE (fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94') and
fi.create_time >= '2020-01-01 00:00:00' and fi.create_time <= '2020-01-31 00:00:00')) AS mm
INNER JOIN ins_dw_prd12.f_invoice_item fit ON fit.invoice_id = mm.invoice_id
inner join ins_dw_prd12.f_invoice m on m.invoice_id = mm.invoice_id
执行计划:
Finalize Aggregate (cost=3083416.86..3083416.87 rows=1 2020-01-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-01-31 00:00:00'::timestamp without time zone))
Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
Rows Removed by Filter: 455651
-> Index Only Scan using f_invoice_pkey on f_invoice m (cost=0.57..2.48 rows=1 91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
and fi.create_time >= '2020-03-01 00:00:00'
and fi.create_time <= '2020-03-31 00:00:00') m
INNER JOIN (select *
from ins_dw_prd12.f_invoice_item
where invoice_id in (SELECT fi.invoice_id
FROM ins_dw_prd12.f_invoice fi
WHERE fi.seller_taxer_code in
('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
and fi.create_time >= '2020-03-01 00:00:00'
and fi.create_time <= '2020-03-31 00:00:00')) fit
ON fit.invoice_id = m.invoice_id
执行计划:
Finalize Aggregate (cost=428280.97..428280.98 rows=1 2020-03-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-03-31 00:00:00'::timestamp without time zone))
Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
Rows Removed by Filter: 601517
-> Parallel Hash (cost=202088.56..202088.56 rows=80840 2020-03-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-03-31 00:00:00'::timestamp without time zone))
Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
Rows Removed by Filter: 601517
-> Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item (cost=0.57..70.85 rows=233 width=8) (actual time=0.011..0.011 rows=1 loops=237290)
Index Cond: (invoice_id = fi_1.invoice_id)
Heap Fetches: 264945
Planning Time: 0.591 ms
Execution Time: 2432.666 ms
效果
从优化前85秒到优化后2.4秒,性能提升接近40倍。