客户有一套数据库,这周有例行停机维护的时间,于是我们趁这次停机例行维护的时间区间进行PARALLEL_EXECUTION_MESSAGE_SIZE参数的修改,修改完成后在重启的过程中遇到了ORA-00600[KGEADE_IS_0]的错误。首先来说一下为什么要修改PARALLEL_EXECUTION_MESSAGE_S
客户有一套数据库,这周有例行停机维护的时间,于是我们趁这次停机例行维护的时间区间进行PARALLEL_EXECUTION_MESSAGE_SIZE参数的修改,修改完成后在重启的过程中遇到了ORA-00600[KGEADE_IS_0]的错误。首先来说一下为什么要修改PARALLEL_EXECUTION_MESSAGE_SIZE这个参数,根据Oracle最佳实践的推荐,10g默认装完数据库该参数的值是2152,也有可能是2048,推荐将这个值设置成8192,而在11g中,这个值默认被设置成了16K,是可以满足大多数应用场景的。这个值的作用就是在并行执行中消息的大小。这个值越大,需要的shared pool也就越大。虽然能获得更好的性能,但是相应的内存也需要的更多了。还有:这个参数在并行恢复或者是standby recover情况下,增加它的大小到4096以上,也能提升至少20%恢复速度。
我们来看一下我们的报错的情况,我们修改一个节点该参数,然后直接重启。
Sun Jul 13 16:57:58 CST 2014 Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Sun Jul 13 16:57:59 CST 2014 Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Sun Jul 13 16:58:00 CST 2014 Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Sun Jul 13 16:58:00 CST 2014 Trace dumping is performing id=[cdmp_20140713165800] Sun Jul 13 16:58:01 CST 2014 Trace dumping is performing id=[cdmp_20140713165801] Sun Jul 13 16:58:07 CST 2014 Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Sun Jul 13 16:58:07 CST 2014 Trace dumping is performing id=[cdmp_20140713165807] *** 2014-07-13 16:57:58.781 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Current SQL statement for this session: select tablespace_id, rfno, allocated_space, file_size, file_maxsize, changescn_base, changescn_wrap, flag from GV$FILESPACE_USAGE where inst_id != :inst and (changescn_wrap >= :w or (changescn_wrap = :w and changescn_base >= :b)) *** 2014-07-13 16:57:59.274 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Current SQL statement for this session: SELECT INSTANCE_NAME, HOST_NAME, NVL(GVI_STARTUP_TIME, SYSTIMESTAMP) - INTERVAL '1' SECOND AS SHUTDOWN_TIME FROM (SELECT RRI.INSTANCE_NAME AS INSTANCE_NAME, RRI.HOST_NAME AS HOST_NAME, FROM_TZ(RRI.STARTUP_TIME , '+00:00') AS RRI_STARTUP_TIME, DBMS_HA_ALERTS_PRVT.INSTANCE_STARTUP_TIMESTAMP_TZ(GVI.STARTUP_TIME) AS GVI_STARTUP_TIME FROM RECENT_RESOURCE_INCARNATIONS$ RRI LEFT OUTER JOIN GV$INSTANCE GVI ON GVI.INSTANCE_N AME = RRI.RESOURCE_NAME WHERE RRI.RESOURCE_TYPE = 'INSTANCE' AND :B2 = RRI.DB_UNIQUE_NAME AND :B1 = RRI.DB_DOMAIN) WHERE GVI_STARTUP_TIME IS NULL OR GVI_STARTUP_TIME > RRI_STARTUP_TIME GROUP BY INSTANCE_NAME, HOST_NAME, GVI_STARTUP_TIME ----- PL/SQL Call Stack ----- object line object handle number name 0x7de705a8 301 package body SYS.DBMS_HA_ALERTS_PRVT 0x7de64740 1 anonymous block
可以看到,都是在查询GV$视图的语句出现了这个错误。我们在来看一下它出错时候的堆栈信息。
ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FFF778810B0 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FFF778810B0 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? ksfdmp()+63 call ksedmp() 000000003 ? 000000001 ? 7FFF778810B0 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? kgerinv()+161 call ksfdmp() 006AE9A20 ? 000000003 ? 7FFF778810B0 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? kgeasnmierr()+163 call kgerinv() 006AE9A20 ? 2B763E0B0040 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? 000000000 ? kgeade()+501 call kgeasnmierr() 006AE9A20 ? 2B763E0B0040 ? 7FFF77881110 ? 7FFF77881050 ? 000000000 ? 000000000 ? kgerev()+58 call kgeade() 2B763E0B0040 ? 006AE9A20 ? 2B763E0B0040 ? 000000000 ? 000000000 ? 000000000 ? kserec0()+186 call kgerev() 006AE9A20 ? 2B763E0B0040 ? 000000000 ? 000000000 ? 7FFF778821A0 ? 000000000 ? kxfpg1sg()+2014 call kserec0() 006AE9A20 ? 000000001 ? 000000029 ? 7FFF77881F40 ? 000000000 ? 388B519840 ? kxfpgsg()+2098 call kxfpg1sg() 08364D278 ? 000000001 ? 7FFF778822B0 ? 7FFF77881F40 ? 08364CC48 ? 2B7600000001 ? kxfrAllocSlaves()+3 call kxfpgsg() 000000005 ? 000000001 ? 51 000000001 ? 000000001 ? 3E0A254800000001 ? 2B763E0A2548 ? kxfrialo()+2111 call kxfrAllocSlaves() 00005322E ? 2B763E5726C0 ? 000000001 ? 7FFF00000001 ? 7FFF00000001 ? 000000001 ? kxfralo()+313 call kxfrialo() 00005322E ? 2B763E5726C0 ? 000000001 ? 07DAA7230 ? 2B763E572768 ? 7FFF77880000 ? qerpx_rowsrc_start( call kxfralo() 00005322E ? 2B763E5726C0 ? )+3892 000000001 ? 07DAA7230 ? 2B763E572768 ? 000000000 ? qerpxStart()+234 call qerpx_rowsrc_start( 7FFF77883280 ? 000000001 ? ) 000000001 ? 07DAA8910 ? 100000001 ? 000000000 ? selexe()+667 call qerpxStart() 000000001 ? 000003F60 ? 000000001 ? 07DAA8910 ? 100000001 ? 000000000 ? opiexe()+4687 call selexe() 07DACBB38 ? 7FFF77883F60 ? 7FFF77883F60 ? 07DACBB38 ? 100000001 ? 000000000 ? kpoal8()+2295 call opiexe() 000000049 ? 000000003 ? 7FFF77884428 ? 000000003 ? 100000001 ? 000000000 ? opiodr()+1184 call kpoal8() 00000005E ? 000000000 ? 7FFF77887EF8 ? 000000003 ? 83B7000000000001 ? 000000000 ? kpoodrc()+38 call opiodr() 00000005E ? 000000000 ? 7FFF77887EF8 ? 000000000 ? 005BEBDF0 ? 000000000 ? rpiswu2()+409 call kpoodrc() 7FFF77885440 ? 000000000 ? 7FFF77887EF8 ? 000000000 ? 005BEBDF0 ? 000000000 ? kpoodr()+554 call rpiswu2() 083B7ABF0 ? 000000000 ? 2B763E0F0CBC ? 000000002 ? 2B763E0F0CFC ? 000000000 ? upirtrc()+2101 call kpoodr() 2B763E342E20 ? 00000005E ? 7FFF77887EF8 ? 000000000 ? 2B763E0F0CFC ? 000000000 ? kpurcsc()+125 call upirtrc() 2B763E342E20 ? 00000005E ? 7FFF77887EF8 ? 7FFF77888060 ? 7FFF77888FD0 ? 003C558C6 ? kpuexecv8()+1705 call kpurcsc() 7FFF778897D0 ? 00000005E ? 7FFF77887EF8 ? 7FFF77888060 ? 7FFF77888FD0 ? 003C558C6 ? kpuexec()+2643 call kpuexecv8() 2B763E0FE958 ? 2B763E33F4C0 ? 2B763E33F540 ? 000000000 ? 000000000 ? 7FFF7788A8C4 ? OCIStmtExecute()+41 call kpuexec() 000000001 ? 2B763E33F4C0 ? 2B763E342DB0 ? 000000001 ? 000000000 ? 000000000 ? ktte_aggregate_finf call OCIStmtExecute() 000000001 ? 2B763E33F4C0 ? o()+3133 2B763E342DB0 ? 000000001 ? 000000000 ? 000000000 ? ktte_monitor_tsth() call ktte_aggregate_finf 7FFF7788B780 ? 000000001 ? +788 o() 000000009 ? 000000001 ? 000000000 ? 000000000 ? ktte_threshold_slav call ktte_monitor_tsth() 7FFF7788B780 ? 000000001 ? e()+183 000000009 ? 000000001 ? 000000000 ? 000000000 ? kebm_slave_main()+2 call ktte_threshold_slav 07F63B200 ? 000000001 ? 21 e() 000000000 ? 000000001 ? 000000000 ? 000000000 ? ksvrdp()+1159 call kebm_slave_main() 07F63B200 ? 07F63B200 ? 000000000 ? 000000001 ? 000000000 ? 000000000 ? opirip()+748 call ksvrdp() 07F63B200 ? 07F63B200 ? 000000000 ? 000000001 ? 000000000 ? 000000000 ? opidrv()+583 call opirip() 000000032 ? 000000004 ? 7FFF7788D298 ? 000000001 ? 000000000 ? 000000000 ? sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7FFF7788D298 ? 000000001 ? 000000000 ? 000000000 ? opimai_real()+317 call sou2o() 7FFF7788D270 ? 000000032 ? 000000004 ? 7FFF7788D298 ? 000000000 ? 000000000 ? main()+116 call opimai_real() 000000003 ? 7FFF7788D300 ? 000000004 ? 7FFF7788D298 ? 000000000 ? 000000000 ? __libc_start_main() call main() 000000003 ? 7FFF7788D300 ? +244 000000004 ? 7FFF7788D298 ? 000000000 ? 000000000 ? _start()+41 call __libc_start_main() 00072D108 ? 000000001 ? 7FFF7788D458 ? 000000000 ? 000000000 ? 000000003 ?
根据文档ORA-600 [kgeade_is_0] In A Real Application Cluster (RAC) Environment (文档 ID 797182.1)里面的描述,凡是trace文件堆栈信息类似于“kxfpg1sg kxfpgsg kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start”这样的,命中bug8592375。解决这个问题的办法也很简单,就是把两个库实例都停下来,修改成相同的参数,然后启动。像我们这样一个实例还在运行着,使用的是以前的参数,而新实例启动之后用的新的参数,就会导致这个问题的出现。还一个办法是安装补丁程序,但是感觉这个补丁是针对standby数据库的。8592375: PHSB: READABLE STANDBY REPORTED ORA-00700:[KGEADE_IS_0]。
参考文档:ORA-600 [kgeade_is_0] In A Real Application Cluster (RAC) Environment (文档 ID 797182.1)
原文地址:修改并行参数引发ORA-600 [kgeade_is_0]的问题处理, 感谢原作者分享。