1.1首先回顾下动态采样(dynamic sampling)的相关知识点 这个特性,使数据库随机的扫描表中少量的block,用来增强数据库的统计信息。 1.1.1 目的 动态采样增加了那些丢失的或者不足的优化器统计信息。使用动态采样可以让优化器更好的选择谓词。动态采样能够
这个特性,使数据库随机的扫描表中少量的block,用来增强数据库的统计信息。
动态采样增加了那些丢失的或者不足的优化器统计信息。使用动态采样可以让优化器更好的选择谓词。动态采样能够补充类似表中block个数,相关的索引block个数,表的集势(ronded个数),相关的连接列的统计信息(提供extendedstatistics的功能)。
动态采样默认为启动状态,可以设置OPTIMIZER_DYNAMIC_SAMPLING=0来禁用掉这一特性。
OPTIMIZER_DYNAMIC_SAMPLING也是和动态采样最重要的参数,它控制着动态采样级别。
OPTIMIZER_DYNAMIC_SAMPLING
Property |
Description |
Parameter type |
Integer |
Default value |
If If If |
Modifiable |
|
Range of values |
|
Dynamic Sampling Levels
10g以上oracle database 的OPTIMIZER_DYNAMIC_SAMPLING参数默认值为2。
下面列举3个动态采样的典型应用场景
当查询中的一个或者多个表没有统计信息,那么优化器就会收集关于表的基本信息用来执行优化操作。
dexter@DAVID> create table tuning8_tab1nologging as
2 select level as id , 'name'|| level as name
3 from dual
4 connect by level <= 10000 ;
Table created.
_dexter@DAVID> create indexidx_tuning8_tab1_id on tuning8_tab1(id) ;
Index created.
先看一下参数的设置
_dexter@DAVID> show parameter samp
NAME TYPE VALUE
----------------------------------------------- ------------------------------
optimizer_dynamic_sampling integer 2
_dexter@DAVID> select * from tuning8_tab1where id=2 ;
ID NAME
---------- --------------------------------------------
2 name2
Execution Plan
----------------------------------------------------------
Plan hash value: 3712969662
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB1 | 1 | 37 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TUNING8_TAB1_ID | 1 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -access("ID"=2)
Note
-----
-dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
3 physical reads
0 redo size
596 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
动态采样已经起作用了。
1.2.2 缺失extendedstatistics
oracle提供相关性统计信息,这里简单演示了一种情况,说明动态采样可以使oracle数据库更加智能的选择查询计划,进而提升性能。(其实使用dbms_stats设置特定的参数一样可以达到这种目的)
create table tect9_tab2 nologging
as select decode( mod(rownum,2), 0, 'D', 'R' )col1,
decode( mod(rownum,2), 0, 'R', 'D' ) col2, t.*
from all_objects t
create index idx_comp_tect9_tab2_col1_col2 ontect9_tab2(col1,col2) ;
execdbms_stats.gather_table_stats(user,'tect9_tab2',method_opt=>'for all indexedcolumns size 254' );
dexter@STARTREK> execdbms_stats.gather_table_stats(user,'tect9_tab2',method_opt=>'for all indexedcolumns size 254' );
PL/SQL procedure successfully completed.
数据总量为72566
dexter@STARTREK> select count(*) fromtect9_tab2 ;
COUNT(*)
----------
72566
已经收集了统计信息,所以常规的查询都可以得到正确的执行计划。
dexter@STARTREK> select * from tect9_tab2where col1='D' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 3262335044
--------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 35484 | 3499K| 302 (1)| 00:00:01 |
|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 35484| 3499K| 302 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
1 -filter("COL1"='D')
这里也是正确的
dexter@STARTREK> select * from tect9_tab2where col1='D' and col2='R' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 3262335044
--------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 17348 | 1711K| 301 (1)| 00:00:01 |
|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 17348| 1711K| 301 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
1 -filter("COL2"='R' AND "COL1"='D')
这里按照常理来说是正确的,但是在本环境中,oracle其实有更智能的一面:
因为col1和col2 之间有关联,col1=’D’的时候col2=’D’的记录是没有的,但是这里预计的条数为18133条,预计错误,oracle是否有更智能的选择呢?
dexter@STARTREK> select * from tect9_tab2where col1='D' and col2='D' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 3262335044
--------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 18133 | 1788K| 302 (1)| 00:00:01 |
|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 18133| 1788K| 302 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
1 -filter("COL1"='D' AND "COL2"='D')
可以看到,使用动态采样,虽然预计得到的条数为但是已经比较接近真实环境了,这里选择索引扫描作为access path。
select /*&#43; dynamic_sampling(t 2) */ * fromtect9_tab2 t where col1='D' and col2='D' ;
dexter@STARTREK> select /*&#43;dynamic_sampling(t 2) */ * from tect9_tab2 t where col1='D' and col2='D' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 1707542874
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 505 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TECT9_TAB2 | 5 | 505 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN |IDX_COMP_TECT9_TAB2_COL1_COL2 | 5 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -access("COL1"='D' AND "COL2"='D')
Note
-----
-dynamic sampling used for this statement (level=2)
如果一个表中的数据,经常执行大批量的加载或者更新,那么在每一次加载或者更新操作后都需要手动的执行收集统计信息的操作。
因为如果不手动的收集,因为统计信息过旧,优化器可能会选择效率较差的access path,影响效率。
其实还有另外一种做法可以避免这样的事情发生,那就是使用动态采样特性。
当一个表没有统计信息的时候,oracle数据库就会使用动态采样来协助优化器。所以对于上述的环境中,就可以删除、并且锁定表的统计信息。这样每次执行查询操作的时候,动态采样的特性都会被使用。
下面给出示例:
_dexter@DAVID> create table tuning8_tab3nologging
2 as
3 select level as id , 'name'||level as name
4 from dual
5 connect by level <= 10000;
Table created.
_dexter@DAVID> create indexidx_tuning8_tab3_id on tuning8_tab3(id) ;
Index created.
_dexter@DAVID> @gather_tab
Enter value for tbname: tuning8_tab3
PL/SQL procedure successfully completed.
_dexter@DAVID>
_dexter@DAVID>
_dexter@DAVID> set autotrace on
_dexter@DAVID> select * from tuning8_tab3where id=1 ;
ID NAME
------------------------------------------------------
1 name1
Execution Plan
----------------------------------------------------------
Plan hash value: 1647674320
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 1 | 13 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 1 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -access("ID"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
596 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
非常正常,优化器选择了最优的accesspath 。
_dexter@DAVID> insert into tuning8_tab3select 10 , 'namess' from dual connect by level <= 50000 ;
50000 rows created.
_dexter@DAVID> commit ;
Commit complete.
可以看到,优化器选择了索引扫描作为accesspath。
_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;
COUNT(NAME)
-----------
50001
Execution Plan
----------------------------------------------------------
Plan hash value: 2464990924
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
3 -access("ID"=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
414 consistent gets
0 physical reads
11588 redo size
531 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
但是其实全表扫描才是最优的
_dexter@DAVID> select /*&#43;full(tuning8_tab3) */ count(name) from tuning8_tab3 where id=10 ;
COUNT(NAME)
-----------
50001
Execution Plan
----------------------------------------------------------
Plan hash value: 926892911
-----------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 10 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 1 | 13 | 10 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=10)
Statistics
----------------------------------------------------------
1 recursive calls
1 db block gets
204 consistent gets
0 physical reads
176 redo size
531 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
问题已经出现了,当然在每次大规模加载数据后,可以收集一次表的统计信息。
_dexter@DAVID> @gather_tab
Enter value for tbname: tuning8_tab3
PL/SQL procedure successfully completed.
_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;
COUNT(NAME)
-----------
50001
Execution Plan
----------------------------------------------------------
Plan hash value: 926892911
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 69 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 50079| 489K| 69 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
268 consistent gets
0 physical reads
0 redo size
531 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
当然还有另外一种方法,那就是今天要说的,使用动态采样的特性。
BEGIN
DBMS_STATS.DELETE_TABLE_STATS(user,'tuning8_tab3');
DBMS_STATS.LOCK_TABLE_STATS(user,'tuning8_tab3');
END;
/
删除统计信息,并且锁定表的统计信息
_dexter@DAVID> BEGIN
2 DBMS_STATS.DELETE_TABLE_STATS(user,'tuning8_tab3');
DBMS_STATS.LOCK_TABLE_STATS(user,'tuning8_tab3');
4 END;
5 /
PL/SQL procedure successfully completed.
这样优化器通常情况下都会产生较正确的查询计划。
_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;
COUNT(NAME)
-----------
50001
Execution Plan
----------------------------------------------------------
Plan hash value: 926892911
-----------------------------------------------------------------------------------
| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 70 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 37 | | |
|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 57162| 2065K| 70 (2)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=10)
Note
-----
-dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
52 recursive calls
0 db block gets
278 consistent gets
0 physical reads
0 redo size
531 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
就算再次加载数据。
_dexter@DAVID> insert into tuning8_tab3select 9 , 'namess' from dual connect by level <= 100000 ;
100000 rows created.
_dexter@DAVID> commit ;
Commit complete.
依然会产生最优的查询计划,保证了效率。
_dexter@DAVID> select count(name) fromtuning8_tab3 where id=9 ;
COUNT(NAME)
-----------
100001
Execution Plan
----------------------------------------------------------
Plan hash value: 926892911
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 105 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 37 | | |
|* 2 | TABLE ACCESS FULL|TUNING8_TAB3 | 135K| 4879K| 105 (2)| 00:00:02 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=9)
Note
-----
-dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
8 recursive calls
1 db block gets
498 consistent gets
0 physical reads
880 redo size
531 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
可以看到,全表扫描的效率明显低于优化器选择的索引扫描的效率。
_dexter@DAVID> select /*&#43;index(tuning8_tab3 idx_tuning8_tab3_id)*/ count(name) from tuning8_tab3 where id=9 ;
COUNT(NAME)
-----------
100001
Execution Plan
----------------------------------------------------------
Plan hash value: 2464990924
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 826 (0)| 00:00:10 |
| 1 | SORT AGGREGATE | | 1 | 37 | | |
| 2 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 135K| 4879K| 826 (0)| 00:00:10 |
|* 3 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 135K| | 363 (0)| 00:00:05 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
3 -access("ID"=9)
Note
-----
-dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
972 consistent gets
0 physical reads
25252 redo size
531 bytes sent via SQL*Net to client
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
当然对于一些对查询计划要求比较高的地方,不建议使用这种方法,特别是olap系统,因为动态采样只是一个折中的收集统计信息的方法,并不像使用dbms_stats包那样准确,也有可能生成错误的执行计划,就像下面的1.3示例一样。
表分析(使用dbms_Stats)可以提供较高品质的统计信息,而动态采样则是一个折中方案,只能提供粗粒度的统计信息,所以对于一些对统计信息要求较高的查询中,因为统计信息缺失而使用动态采样是无法保证优化器选择最优的执行计划。
如下例所示:
_dexter@DAVID> create table tuning8_tab2nologging as
2 select level as id , 'name'|| level asname
3 from dual
4 connect by level <= 100000 ;
Table created.
_dexter@DAVID> create indexidx_tuning8_tab2_id on tuning8_tab2 (id) ;
Index created.
_dexter@DAVID> update tuning8_tab2 setid=10 where id > 40000 and id <70000 ;
29999 rows updated.
_dexter@DAVID> commit ;
Commit complete.
默认没有统计信息,使用动态采样,此时选择的是全表扫描,cOnsistantgets=356。
但是其实最优的access path 应该是索引扫描。
_dexter@DAVID> select count(*) fromtuning8_tab2 where id=10 ;
COUNT(*)
----------
30000
Execution Plan
----------------------------------------------------------
Plan hash value: 2380771210
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 81 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| TUNING8_TAB2 | 33120 | 420K| 81 (2)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=10)
Note
-----
- dynamic samplingused for this statement (level=2)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
356 consistent gets
0 physical reads
0 redo size
526 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
_dexter@DAVID> @gather_tab
Enter value for tbname: tuning8_tab2
PL/SQL procedure successfully completed.
再次查询,此时使用的是索引快速扫描,consistent gets为297。
_dexter@DAVID> select count(*) fromtuning8_tab2 where id=10 ;
COUNT(*)
----------
30000
Execution Plan
----------------------------------------------------------
Plan hash value: 2673526969
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 73 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
|* 2 | INDEX FAST FULL SCAN|IDX_TUNING8_TAB2_ID | 29921 | 146K| 73 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operationid):
---------------------------------------------------
2 -filter("ID"=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
297 consistent gets
0 physical reads
0 redo size
526 bytes sent via SQL*Net toclient
523 bytes received via SQL*Netfrom client
2 SQL*Net roundtrips to/fromclient
0 sorts (memory)
0 sorts (disk)
1 rows processed
上面只是一个比较粗劣的实验,查询也比较简单。对于像olap这种对查询计划要求比较高,查询计划非常复杂的数据库,尽量保持统计信息的实时性是非常有必要的。