下面的案例来至于一位兄弟的环境,10G RAC,CRS被意外删除,数据库被crash了。通过将RAC更改为单机直接启动,报下面的错误
Errors in file /oracle/10.2/admin/gzora/udump/gzora_ora_8355.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 3 with name "_SYSSMU3$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 8355
ORA-1092 signalled during: alter database open…
1,配置10046、errorstack
oradebug setmypid
oradebug event 10046 trace name context forever,level 12;
oradebug event 1555 trace name errorstack level 2;
recover database using backup controlfile until cancel;
alter database open resetlogs;
查看报错的详细信息
1.1 查看block信息
d:\>grep -E "^Block he|^0x0" gzora_ora_8897.trc
Block header dump: 0x0040006a
0x01 0x0000.046.00000029 0x00400011.0037.1d –U- 1 fsc 0x0000.02a1d124
这里发现ITL 01的状态是U,并且锁定1行的记录
查看块的dump信息,以0x0040006a就可以搜索到
BH (0x160f96ad8) file#: 1 rdba: 0x0040006a (1/106) class: 1 ba: 0x1603d8000
set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
dbwrid: 0 obj: 15 objn: 15 tsn: 0 afn: 1
hash: [187c9e220,187c9e220] lru: [160f96c68,1868cb9f8]
ckptq: [NULL] fileq: [NULL] objq: [183facdb0,183facdb0]
st: XCURRENT md: NULL tch: 1
flags:
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 0 rdba: 0x0040006a (1/106)
scn: 0x0000.02a1d124 seq: 0x01 flg: 0x06 tail: 0xd1240601
frmt: 0x02 chkval: 0x21bd type: 0x06=trans data
Block header dump: 0x0040006a
Object id on Block? Y
seg/obj: 0xf csc: 0x00.2a1d123 itc: 1 flg: O typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.046.00000029 0x00400011.0037.1d –U- 1 fsc 0x0000.02a1d124
这里可以看到块的对象,csc,块号等信息
下面查看一下锁定行的记录
tab 0, row 10, @0x143d
tl: 60 fb: –H-FL– lb: 0x1 cc: 17
col 0: [ 2] c1 0b
col 1: [10] 5f 53 59 53 53 4d 55 31 30 24
col 2: [ 2] c1 02
col 3: [ 2] c1 03
col 4: [ 3] c2 02 36
col 5: [ 5] c4 2d 10 5d 19
col 6: [ 1] 80
col 7: [ 4] c3 03 09 0c
col 8: [ 4] c3 02 1a 04
col 9: [ 1] 80
col 10: [ 2] c1 04
col 11: [ 2] c1 02
col 12: *NULL*
col 13: *NULL*
col 14: *NULL*
col 15: *NULL*
col 16: [ 2] c1 02
tab 0, row 11, @0x1831
1.2 查看SQL与会话信息
查看当前的信息信息,以CURRENT SESSION搜索
===================================================
CURRENT SESSION’S INSTANTIATION STATE
————————————-
current session=0x1888972b8
KGI STATE DUMP for session=0x1888972b8
KGI STATE DUMP DONE for session=0x1888972b8
******************** Session Cursor Dump **********************
Current cursor: 5, pgadep: 1
Open cursors(pls, sys, hwm, max): 5(0, 4, 64, 300)
NULL 0 SYNTAX 0 PARSE 3 BOUND 1 FETCH 1 ROW 0
Cached frame pages(total, free):
4k(4, 2), 8k(0, 0), 16k(0, 0), 32k(0, 0)
pgactx: 0x185fffd20 ctxcbk: (nil) ctxqbc: (nil) ctxrws: 0x185ffd940
Explain plan:
Dumping stream from 0 to 78
———————————-
0000: 143 216 224 255 133 1 0 0 0 143 119 1 0 38 40 0 0 3 1 2 ……….w..&(…..
0020: 18 3 2 14 0 0 0 54 7 3 5 12 8 106 108 109 224 152 228 164 …….6…..jlm….
0040: 107 111 3 52 3 3 2 3 1 2 4 34 0 0 143 120 229 255 133 1 ko.4……."…x….
0060: 0 0 0 134 119 2 23 13 0 0 3 1 5 36 3 2 1 142 ….w……..$….
============
Plan Table
============
———————————————–+———————————–+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
———————————————–+———————————–+
| 0 | SELECT STATEMENT | | | | | |
| 1 | TABLE ACCESS BY INDEX ROWID | OBJ$ | | | | |
| 2 | INDEX UNIQUE SCAN | I_OBJ1 | | | | |
———————————————–+———————————–+
Content of other_xml column
===========================
db_version : 10.2.0.4
parse_schema : SYS
plan_hash : 3317232865
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(‘10.2.0.4’)
RBO_OUTLINE
OUTLINE_LEAF(@"SEL$1")
INDEX_RS_ASC(@"SEL$1" "OBJ$"@"SEL$1" ("OBJ$"."OBJ#"))
END_OUTLINE_DATA
*/
以PARSING IN CURSOR #5来搜索
以Cursor#5来搜索
—————————————-
Cursor#5(0x2aaaac891988) state=FETCH curiob=0x2aaaac8a8408
curflg=7 fl2=0 par=0x2aaaac8917e8 ses=0x1888972b8
sqltxt(0x186f42730)=select ctime, mtime, stime from obj$ where obj# = :1这里可以看到SQL的内容
hash=fa0bd3f60d6ee4f2495f9af8199b75b9
parent=0x186f42378 maxchild=01 plk=0x183f7a478 ppn=n
cursor instantiation=0x2aaaac8a8408 used=1411388199
child#0(0x186f41800) pcs=0x186f41f88
clk=0x183f7ae88 ci=0x186f41448 pn=0x1823bd088 ctx=0x185fffd20
kgsccflg=0 llk[0x2aaaac8a8410,0x2aaaac8a8410] idx=0
xscflg=e01414f6 fl2=45000401 fl3=4022210c fl4=100
Bind bytecodes
Opcode = 1 Unoptimized
Offsi = 48, Offsi = 0
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2aaaac8a7fc0 bln=22 avl=02 flg=05
value=20这里可以看到绑定变量的值
Frames pfr 0x2aaaac8a8388 siz=2920 efr 0x2aaaac8a80a8 siz=2720
Cursor frame dump
enxt: 4.0×00000010 enxt: 3.0x000000c0 enxt: 2.0×00000020 enxt: 1.0x000009c0
pnxt: 2.0×00000008 pnxt: 1.0x000000c0
kxscphp 0x2aaaac89a1a8 siz=1992 inu=1160 nps=1112
kxscdfhp 0x2aaaac8a0808 siz=984 inu=88 nps=0
kxscbhp 0x2aaaac8a0628 siz=984 inu=168 nps=48
kxscwhp 0x2aaaac8a0718 siz=4056 inu=360 nps=0
其实到这里我们还没有用到10046的数据,下面再看看10046的trace文件
=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1378308789118580 hv=429618617 ad=’86f42730′
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=282,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1378308789118577
BINDS #5:
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2aaaac8a7fc0 bln=22 avl=02 flg=05
value=20
EXEC #5:c=999,e=477,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1378308789119125
WAIT #5: nam=’db file sequential read’ ela= 1826 file#=1 block#=218 blocks=1 obj#=-1 tim=1378308789121036
WAIT #5: nam=’db file sequential read’ ela= 3930 file#=1 block#=219 blocks=1 obj#=-1 tim=1378308789125031
WAIT #5: nam=’db file sequential read’ ela= 4146 file#=1 block#=122 blocks=1 obj#=-1 tim=1378308789129225
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1378308789129622 hv=361892850 ad=’85ffc6d8′
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=1000,e=339,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=1378308789129620
BINDS #6:
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2aaaacb2dbd0 bln=22 avl=02 flg=05
value=8
EXEC #6:c=0,e=505,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=1378308789130223
WAIT #6: nam=’db file sequential read’ ela= 3262 file#=1 block#=202 blocks=1 obj#=-1 tim=1378308789133526
WAIT #6: nam=’db file sequential read’ ela= 1110 file#=1 block#=106 blocks=1 obj#=-1 tim=1378308789134677
FETCH #6:c=0,e=4472,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=1378308789134719
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op=’TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=4466 us)’
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op=’INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=3304 us)’
WAIT #5: nam=’db file sequential read’ ela= 2661 file#=2 block#=1152 blocks=1 obj#=-1 tim=1378308789137472
这里可以看到在出故障的SQL下面还有很多SQL语法正常执行了,所以10046最后一条语句并不是报错的SQL语句
2 故障原因
通过上面的trace信息,大概知道原因,在访问obj$表的时,需要访问undo$表,undo$表中发现一条itl为U,在做延迟块清处过程中,在UNDO段中没有找到此ITL的信息,触发ORA-01555的报错。
3,故障处理
SQL> select checkpoint_change# from v$datafile_header; 远程的工作完成,剩下的一些收尾的工作由现成工程师完成。
CHECKPOINT_CHANGE#
——————
44279281
44279281
44279281
44279281
44279281
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 44279281 generated at 09/22/2014 20:27:54 needed for thread 1
ORA-00289: suggestion : +ORADATA
ORA-00280: change 44279281 for thread 1 is in sequence #1
Specify log: {
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘+ORADATA/gzora/datafile/system.258.757644911’
ORA-01112: media recovery not started
这里手动增加SCN的值就可以。
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [0600122B0, 0600122E0) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 60011F90 00000000
SQL> select to_char(54279281,’xxxxxxxxxxxxxxxx’) from dual;
TO_CHAR(54279281,
—————–
33c3c71
SQL> oradebug poke 0x0600122B0 4 0x33c3c71;
BEFORE: [0600122B0, 0600122B4) = 00000000
AFTER: [0600122B0, 0600122B4) = 033C3C71
SQL> alter database open resetlogs;
Database altered.
SQL> create undo tablespace undotbs2 datafile ‘+oradata’ size 10g;
Tablespace created.
切换到新的UNDO表空间