热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

PostgreSQL中的函数索引

今天一位同事又问到了这个问题,函数索引.这是一个老问题了.PostgreSQL中函数有三种状态,不稳定,稳定,非常稳定.三种状态分别代表的意思如下:如果一个时间字段上要创建to_char(timestampwithouttimezone,text)的函数索引.是没有办法创建的,报错

今天一位同事又问到了这个问题, 函数索引. 这是一个老问题了.

PostgreSQL中函数有三种状态, 不稳定, 稳定, 非常稳定.

三种状态分别代表的意思如下 : 

如果一个时间字段上要创建to_char(timestamp without time zone, text)的函数索引.
是没有办法创建的, 报错

digoal=> create table t1 (crt_time
timestamp);

CREATE TABLE

digoal=> \set VERBOSITY verbose

digoal=> create index t1_1 on t1
(to_char(crt_time,'yyyy-mm-dd'));

ERROR:  42P17: functions in index expression must be
marked IMMUTABLE

LOCATION:  ComputeIndexAttrs,
indexcmds.c:909

为什么呢?

 看看to_char函数的稳定性状态 : 

digoal=> select proname,provolatile,proargtypes from pg_proc where prOname='to_char';
proname | provolatile | proargtypes
---------+-------------+-------------
to_char | s | 20 25
to_char | s | 23 25
to_char | s | 700 25
to_char | s | 701 25
to_char | s | 1114 25
to_char | s | 1184 25
to_char | s | 1186 25
to_char | s | 1700 25
(8 rows)
digoal=> select oid,typname from pg_type where oid in (20,25,23,700,701,1114,1184,1186,1700);
oid | typname
------+-------------
20 | int8
23 | int4
25 | text
700 | float4
701 | float8
1114 | timestamp
1184 | timestamptz
1186 | interval
1700 | numeric
(9 rows)

为什么函数索引一定要immutable的函数呢?

函数索引一旦建立后, 如果执行计划走索引扫描, PostgreSQL必须确保在任何情况下,
函数得到的结果和建立函数索引时得到的结果是一致的, 否则就会出现走索引检索到的和走全表扫描检索到的结果不一致.
这可是灾难性的BUG.

测试如下 : 

digoal=> create table test (id int,crt_time timestamp(0) with time zone);
CREATE TABLE

-- 正常情况下创建这个函数索引将报错

digoal=> create index idx_test on test (to_char(crt_time,'yyyymmddhh24'));
ERROR: 42P17: functions in index expression must be marked IMMUTABLE
LOCATION: ComputeIndexAttrs, indexcmds.c:909

-- 手工修改了to_char(timestamp with time zone,text)的strict
immutable.

digoal=> \c digoal postgres
You are now connected to database "digoal" as user "postgres".
digoal=# alter function to_char(timestamp with time zone,text) strict immutable;
ALTER FUNCTION

-- 新建函数索引成功

digoal=# \c digoal digoal

You are now connected to database "digoal" as user
"digoal".

digoal=> create index idx_test on test
(to_char(crt_time,'yyyymmddhh24'));

CREATE INDEX

-- 接下来查看当前的TimeZone

digoal=> show TimeZone;
TimeZone
----------
PRC
(1 row)

-- 插入一条测试数据

digoal=> insert into test values (1,'2012-01-01 12:00:00');
INSERT 0 1

-- 在当前的TimeZone下查看to_char(crt_time,'yyyymmddhh24') =
2012010112

digoal=> select to_char(crt_time,'yyyymmddhh24') from test;
to_char
------------
2012010112
(1 row)

-- 把TimeZone改成了GMT

digoal=> set TimeZOne='GMT';
SET

-- 查看to_char(crt_time,'yyyymmddhh24') =
2012010104

-- 同样的参数, 得到的结果不一致. 

digoal=> select to_char(crt_time,'yyyymmddhh24') from test;
to_char
------------
2012010104
(1 row)

-- 接下来在执行计划是走索引时的查询结果. 因为建立索引时,
这个to_char(crt_time,'yyyymmddhh24')是等于2012010112的, 

-- 所以走索引扫描将返回一条记录.

digoal=> explain select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using idx_test on test (cost=0.00..8.43 rows=10 width=12)
Index Cond: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)
(2 rows)
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
id | crt_time
----+------------------------
1 | 2012-01-01 04:00:00+00
(1 row)

-- 接下来严重BUG发生了, 走全表扫描的话同样的SQL, 查询结果没了.

-- 因为走全表扫描的话, 这个值
to_char(crt_time,'yyyymmddhh24') 将被重新运算,
结果是2012010104

digoal=> set enable_indexscan=off;
SET
digoal=> set enable_bitmapscan=off;
SET
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
id | crt_time
----+----------
(0 rows)
digoal=> explain analyze select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..39.10 rows=10 width=12) (actual time=0.031..0.031 rows=0 loops=1)
Filter: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)
Total runtime: 0.066 ms
(3 rows)

-- 查询2012010104则有结果, 返回一条记录

digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010104';
id | crt_time
----+------------------------
1 | 2012-01-01 04:00:00+00
(1 row)

-- 所以在使用函数索引时需要慎重, 创建函数是strict的选择也要慎重.

-- 如果一定要在时间类型上有类似的请求怎么办, 第一使用timestamp without time zone ,
使用immutable的date_trunc函数代替to_char

digoal=> select proname,provolatile,proargtypes from pg_proc where prOname='date_trunc';
proname | provolatile | proargtypes
------------+-------------+-------------
date_trunc | s | 25 1184
date_trunc | i | 25 1186
date_trunc | i | 25 1114
(3 rows)
digoal=> select oid,typname from pg_type where oid in (25,1184,1186,1114);
oid | typname
------+-------------
25 | text
1114 | timestamp
1184 | timestamptz
1186 | interval
(4 rows)
【其他】

1. Thinking PostgreSQL Function's Volatility Categories

仅仅适用timestamp without time zone, 带时区的都会有以上类似的问题.

2.1 以下是个折中方案, 不必修改系统函数to_char的strict, 但是必须符合一定条件才可以冒这个险去使用它.

this is intended behaviour as to_char depends on the LC_MESSAGES
setting

也就是说在使用环境中不会涉及LC_MESSAGES的变动(或者变动后不会影响to_char的结果.)时方可使用.

CREATE OR REPLACE FUNCTION my_to_char(some_time timestamp)
RETURNS text
AS
$BODY$
select to_char($1, 'yyyy-mm-dd');
$BODY$
LANGUAGE sql
IMMUTABLE;

2.2

cast(crt_time as date)

crt_time必须是不带时区的. 否则会有类似问题

digoal=> set TimeZOne='PRC';
SET
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);
date
------------
2012-01-01
(1 row)
digoal=> set TimeZOne='GMT';
SET
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);
date
------------
2011-12-31
(1 row)
3. 还有一种方法是增加一列, 用来存储需要建立的函数索引to_char()的结果值. 这样就不需要使用函数索引了,
也不需要考虑函数的strict问题.

推荐阅读
author-avatar
福田汽车-唐山万联
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有