热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

oracle10g关闭异步io,rac修改异步IO导致挂死

今天遇到个rac的异步IO的问题,环境是oracle10201的双节点rac,存储使用ASM,机器是HP的PA的机器,操作系

今天遇到个rac的异步IO的问题,环境是oracle 10201的双节点rac,存储使用ASM,机器是HP 的PA的机器,操作系统是11.23.

在udump下的日志中有大量的报错:

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump> cat gzxnh1_ora_19552.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh1_ora_19552.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP and Data Mining options

ORACLE_HOME = /oracle/oracle/product/10.2.0/db_2

System name: HP-UX

Node name: rp8440a

Release: B.11.23

Version: U

Machine: 9000/800

Instance name: gzxnh1

Redo thread mounted by this instance: 1

Oracle process number: 0

Unix process pid: 19552, image: oracle@rp8440a

Ioctl ASYNC_CONFIG error, errno = 1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>catgzxnh1_ora_19552.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh1_ora_19552.trc

OracleDatabase10gEnterpriseEditionRelease10.2.0.1.0-64bitProduction

WiththePartitioning,RealApplicationClusters,OLAPandDataMiningoptions

ORACLE_HOME=/oracle/oracle/product/10.2.0/db_2

Systemname:HP-UX

Nodename:rp8440a

Release:B.11.23

Version:U

Machine:9000/800

Instancename:gzxnh1

Redothreadmountedbythisinstance:1

Oracleprocessnumber:0

Unixprocesspid:19552,image:oracle@rp8440a

IoctlASYNC_CONFIGerror,errno=1

看到这个报错,觉得很轻松,之前也遇到过,是oracle用户的dba组在操作系统上没有MLOCK的权限,在这里的文章,我也写到过。由于hp的11.23的系统,默认是开asyncdsk的。于是就用root给oracle用户的dba组加MLOCK权限了。

可是,等加完之后,问题就出现了:登录的提示就是不完整的,登录之后select所有信息,都会报没有连接数据库:

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump> sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Dec 3 10:28:47 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected.

SQL> select * from dual;

select * from dual

*

ERROR at line 1:

ORA-01012: not logged on

SQL> conn /a as sysdba

Connected to an idle instance.

SQL>

SQL>

SQL> exit

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump>sqlplus"/ as sysdba"

SQL*Plus:Release10.2.0.1.0-ProductiononThuDec310:28:472009

Copyright(c)1982,2005,Oracle.Allrightsreserved.

Connected.

SQL>select *fromdual;

select *fromdual

*

ERRORatline1:

ORA-01012:notloggedon

SQL>conn/aassysdba

Connectedtoanidleinstance.

SQL>

SQL>

SQL>exit

检查环境变量的正常的,确实登录的是gzxnh1的实例,而且,oracle的进程都是在的:

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump> ps -ef |grep ora _

oracle 24562 13823 0 10:28:41 pts/1 0:00 grep ora_

oracle 26691 1 0 00:37:55 ? 0:00 ora_q002_gzxnh1

oracle 18836 1 0 Nov 28 ? 4:42 ora_cjq0_gzxnh1

oracle 18865 1 0 Nov 28 ? 4:15 ora_mmon_gzxnh1

oracle 19044 1 0 Nov 28 ? 0:09 ora_rbal_gzxnh1

oracle 18614 1 0 Nov 28 ? 1:13 ora_pmon_gzxnh1

oracle 19598 1 0 Nov 28 ? 273:22 ora_j000_gzxnh1

oracle 18800 1 0 Nov 28 ? 2:03 ora_ckpt_gzxnh1

oracle 18927 1 0 Nov 28 ? 13:13 ora_lck0_gzxnh1

oracle 18643 1 0 Nov 28 ? 3:21 ora_diag_gzxnh1

oracle 18805 1 0 Nov 28 ? 1:42 ora_smon_gzxnh1

oracle 18672 1 0 Nov 28 ? 0:10 ora_psp0_gzxnh1

oracle 18674 1 0 Nov 28 ? 22:01 ora_lmon_gzxnh1

oracle 18771 1 0 Nov 28 ? 2:56 ora_lgwr_gzxnh1

oracle 19013 1 0 Nov 28 ? 0:11 ora_asmb_gzxnh1

oracle 18834 1 0 Nov 28 ? 0:00 ora_reco_gzxnh1

oracle 18867 1 0 Nov 28 ? 7:25 ora_mmnl_gzxnh1

oracle 19835 1 0 Nov 28 ? 0:14 ora_q000_gzxnh1

oracle 18898 1 0 Nov 28 ? 0:00 ora_s000_gzxnh1

oracle 18896 1 0 Nov 28 ? 0:00 ora_d000_gzxnh1

oracle 18707 1 0 Nov 28 ? 10:52 ora_lms0_gzxnh1

oracle 18703 1 0 Nov 28 ? 18:31 ora_lmd0_gzxnh1

oracle 18736 1 0 Nov 28 ? 8:54 ora_lms1_gzxnh1

oracle 19461 1 0 22:00:01 ? 0:05 ora_j001_gzxnh1

oracle 18738 1 0 Nov 28 ? 0:25 ora_mman_gzxnh1

oracle 18769 1 0 Nov 28 ? 3:22 ora_dbw0_gzxnh1

oracle 27708 1 0 10:54:35 ? 1:47 ora_pz99_gzxnh1

oracle 15114 1 0 Dec 1 ? 0:00 ora_o002_gzxnh1

oracle 19541 1 0 Nov 28 ? 0:14 ora_qmnc_gzxnh1

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump>ps-ef|grepora_

oracle2456213823010:28:41pts/10:00grepora_

oracle266911000:37:55?0:00ora_q002_gzxnh1

oracle1883610Nov28?4:42ora_cjq0_gzxnh1

oracle1886510Nov28?4:15ora_mmon_gzxnh1

oracle1904410Nov28?0:09ora_rbal_gzxnh1

oracle1861410Nov28?1:13ora_pmon_gzxnh1

oracle1959810Nov28?273:22ora_j000_gzxnh1

oracle1880010Nov28?2:03ora_ckpt_gzxnh1

oracle1892710Nov28?13:13ora_lck0_gzxnh1

oracle1864310Nov28?3:21ora_diag_gzxnh1

oracle1880510Nov28?1:42ora_smon_gzxnh1

oracle1867210Nov28?0:10ora_psp0_gzxnh1

oracle1867410Nov28?22:01ora_lmon_gzxnh1

oracle1877110Nov28?2:56ora_lgwr_gzxnh1

oracle1901310Nov28?0:11ora_asmb_gzxnh1

oracle1883410Nov28?0:00ora_reco_gzxnh1

oracle1886710Nov28?7:25ora_mmnl_gzxnh1

oracle1983510Nov28?0:14ora_q000_gzxnh1

oracle1889810Nov28?0:00ora_s000_gzxnh1

oracle1889610Nov28?0:00ora_d000_gzxnh1

oracle1870710Nov28?10:52ora_lms0_gzxnh1

oracle1870310Nov28?18:31ora_lmd0_gzxnh1

oracle1873610Nov28?8:54ora_lms1_gzxnh1

oracle194611022:00:01?0:05ora_j001_gzxnh1

oracle1873810Nov28?0:25ora_mman_gzxnh1

oracle1876910Nov28?3:22ora_dbw0_gzxnh1

oracle277081010:54:35?1:47ora_pz99_gzxnh1

oracle1511410Dec1?0:00ora_o002_gzxnh1

oracle1954110Nov28?0:14ora_qmnc_gzxnh1

oracle@rp8440a:/oracle/oracle/admin/gzxnh/udump>

怎么回事?为何数据库登录有问题?登录上去报错没有连接,再次登录就报错connect to an idle instance?

进一步检查oracle的alertlog:

Thread 1 advanced to log sequence 332

Current log# 6 seq# 332 mem# 0: +DATA1/gzxnh/onlinelog/group_6.592.704141411

Thu Dec 3 10:24:47 2009

Process PZ98 died, see its trace file

Thu Dec 3 10:32:55 2009

Process PZ99 died, see its trace file

Process PZ99 died, see its trace file

Thu Dec 3 10:33:57 2009

Process PZ99 died, see its trace file

Thu Dec 3 10:34:26 2009

Process PZ99 died, see its trace file

Process PZ99 died, see its trace file

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>

1

2

3

4

5

6

7

8

9

10

11

12

13

Thread1advancedtologsequence332

Currentlog# 6 seq# 332 mem# 0: +DATA1/gzxnh/onlinelog/group_6.592.704141411

ThuDec310:24:472009

ProcessPZ98died,seeitstracefile

ThuDec310:32:552009

ProcessPZ99died,seeitstracefile

ProcessPZ99died,seeitstracefile

ThuDec310:33:572009

ProcessPZ99died,seeitstracefile

ThuDec310:34:262009

ProcessPZ99died,seeitstracefile

ProcessPZ99died,seeitstracefile

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>

PZ99的进程死了?虽然在alertlog中没看到trace file指的是哪个,但是通过bdump下的最新一个trace文件可以看到:

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump> cat gzxnh1 _ora _19598.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh1_ora_19598.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP and Data Mining options

ORACLE_HOME = /oracle/oracle/product/10.2.0/db_2

System name: HP-UX

Node name: rp8440a

Release: B.11.23

Version: U

Machine: 9000/800

Instance name: gzxnh1

Redo thread mounted by this instance: 1

Oracle process number: 0

Unix process pid: 19598, image: oracle@rp8440a

……

*** 2009-12-03 10:34:26.934

*** ACTION NAME:() 2009-12-03 10:34:26.934

*** MODULE NAME:() 2009-12-03 10:34:26.934

*** SERVICE NAME:(SYS$USERS) 2009-12-03 10:34:26.934

*** CLIENT ID:() 2009-12-03 10:34:26.934

*** SESSION ID:(3251.32206) 2009-12-03 10:34:26.934

kxfpg1srv

could not start local PZ99

kxfpg1sg

GV$ query failed to get slave on inst 1

kxfpg1srv

could not start PZ99, inst 2

kxfpg1sg

GV$ query failed to get slave on inst 2

kxfpg1srv

could not start local PZ99

kxfpg1sg

GV$ query failed to get slave on inst 1

kxfpg1srv

could not start PZ99, inst 2

kxfpg1sg

GV$ query failed to get slave on inst 2

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>catgzxnh1_ora_19598.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh1_ora_19598.trc

OracleDatabase10gEnterpriseEditionRelease10.2.0.1.0-64bitProduction

WiththePartitioning,RealApplicationClusters,OLAPandDataMiningoptions

ORACLE_HOME=/oracle/oracle/product/10.2.0/db_2

Systemname:HP-UX

Nodename:rp8440a

Release:B.11.23

Version:U

Machine:9000/800

Instancename:gzxnh1

Redothreadmountedbythisinstance:1

Oracleprocessnumber:0

Unixprocesspid:19598,image:oracle@rp8440a

……

***2009-12-0310:34:26.934

***ACTIONNAME:()2009-12-0310:34:26.934

***MODULENAME:()2009-12-0310:34:26.934

***SERVICENAME:(SYS$USERS)2009-12-0310:34:26.934

***CLIENTID:()2009-12-0310:34:26.934

***SESSIONID:(3251.32206)2009-12-0310:34:26.934

kxfpg1srv

couldnotstartlocalPZ99

kxfpg1sg

GV$queryfailedtogetslaveoninst1

kxfpg1srv

couldnotstartPZ99,inst2

kxfpg1sg

GV$queryfailedtogetslaveoninst2

kxfpg1srv

couldnotstartlocalPZ99

kxfpg1sg

GV$queryfailedtogetslaveoninst1

kxfpg1srv

couldnotstartPZ99,inst2

kxfpg1sg

GV$queryfailedtogetslaveoninst2

oracle@rp8440a:/oracle/oracle/admin/gzxnh/bdump>

在rac2上看:

oracle@rp8440b:/oracle/oracle/admin/gzxnh/bdump> cat gzxnh2_ora_10642.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh2_ora_10642.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP and Data Mining options

ORACLE_HOME = /oracle/oracle/product/10.2.0/db_2

System name: HP-UX

Node name: rp8440b

Release: B.11.23

Version: U

Machine: 9000/800

Instance name: gzxnh2

Redo thread mounted by this instance: 2

Oracle process number: 0

Unix process pid: 10642, image: oracle@rp8440b

Died during process startup with error 27061 (seq=596)

OPIRIP: Uncaught error 27061. Error stack:

ORA-27061: waiting for async I/Os failed

oracle@rp8440b:/oracle/oracle/admin/gzxnh/bdump

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

oracle@rp8440b:/oracle/oracle/admin/gzxnh/bdump>catgzxnh2_ora_10642.trc

/oracle/oracle/admin/gzxnh/bdump/gzxnh2_ora_10642.trc

OracleDatabase10gEnterpriseEditionRelease10.2.0.1.0-64bitProduction

WiththePartitioning,RealApplicationClusters,OLAPandDataMiningoptions

ORACLE_HOME=/oracle/oracle/product/10.2.0/db_2

Systemname:HP-UX

Nodename:rp8440b

Release:B.11.23

Version:U

Machine:9000/800

Instancename:gzxnh2

Redothreadmountedbythisinstance:2

Oracleprocessnumber:0

Unixprocesspid:10642,image:oracle@rp8440b

Diedduringprocessstartupwitherror27061(seq=596)

OPIRIP:Uncaughterror27061.Errorstack:

ORA-27061:waitingforasyncI/Osfailed

oracle@rp8440b:/oracle/oracle/admin/gzxnh/bdump

看来确实和rac上刚刚开启的异步IO有关系了,既然这样,赶紧还是把异步IO关闭了吧:在操作系统级别用sam将asyncdsk改成unused,将/etc/privgroup文件删除,重启主机后,数据库启动,恢复正常。

PZ99是rac中并发进程的slave进程(可查看这里),查询Gv$视图需要该进程,目前我猜测是由于修改成了异步IO,导致原来正在执行的pz99进程挂死,sqlplus登录时无法查询gv$视图。如果此时kill掉PZ99的进程,说不定不用重启主机,关闭异步IO,就能解决故障了。

不过,还是想不通PZ99进程和异步IO间的关系,哪位如果也遇到这个情况,能说说当时你是怎么处理的么?



推荐阅读
  • r2dbc配置多数据源
    R2dbc配置多数据源问题根据官网配置r2dbc连接mysql多数据源所遇到的问题pom配置可以参考官网,不过我这样配置会报错我并没有这样配置将以下内容添加到pom.xml文件d ... [详细]
  • 安装mysqlclient失败解决办法
    本文介绍了在MAC系统中,使用django使用mysql数据库报错的解决办法。通过源码安装mysqlclient或将mysql_config添加到系统环境变量中,可以解决安装mysqlclient失败的问题。同时,还介绍了查看mysql安装路径和使配置文件生效的方法。 ... [详细]
  • 本文介绍了数据库的存储结构及其重要性,强调了关系数据库范例中将逻辑存储与物理存储分开的必要性。通过逻辑结构和物理结构的分离,可以实现对物理存储的重新组织和数据库的迁移,而应用程序不会察觉到任何更改。文章还展示了Oracle数据库的逻辑结构和物理结构,并介绍了表空间的概念和作用。 ... [详细]
  • 本文介绍了Oracle数据库中tnsnames.ora文件的作用和配置方法。tnsnames.ora文件在数据库启动过程中会被读取,用于解析LOCAL_LISTENER,并且与侦听无关。文章还提供了配置LOCAL_LISTENER和1522端口的示例,并展示了listener.ora文件的内容。 ... [详细]
  • 图解redis的持久化存储机制RDB和AOF的原理和优缺点
    本文通过图解的方式介绍了redis的持久化存储机制RDB和AOF的原理和优缺点。RDB是将redis内存中的数据保存为快照文件,恢复速度较快但不支持拉链式快照。AOF是将操作日志保存到磁盘,实时存储数据但恢复速度较慢。文章详细分析了两种机制的优缺点,帮助读者更好地理解redis的持久化存储策略。 ... [详细]
  • 解决VS写C#项目导入MySQL数据源报错“You have a usable connection already”问题的正确方法
    本文介绍了在VS写C#项目导入MySQL数据源时出现报错“You have a usable connection already”的问题,并给出了正确的解决方法。详细描述了问题的出现情况和报错信息,并提供了解决该问题的步骤和注意事项。 ... [详细]
  • 本文讨论了在数据库打开和关闭状态下,重新命名或移动数据文件和日志文件的情况。针对性能和维护原因,需要将数据库文件移动到不同的磁盘上或重新分配到新的磁盘上的情况,以及在操作系统级别移动或重命名数据文件但未在数据库层进行重命名导致报错的情况。通过三个方面进行讨论。 ... [详细]
  • CentOS 6.5安装VMware Tools及共享文件夹显示问题解决方法
    本文介绍了在CentOS 6.5上安装VMware Tools及解决共享文件夹显示问题的方法。包括清空CD/DVD使用的ISO镜像文件、创建挂载目录、改变光驱设备的读写权限等步骤。最后给出了拷贝解压VMware Tools的操作。 ... [详细]
  • MongoDB用户验证auth的权限设置及角色说明
    本文介绍了MongoDB用户验证auth的权限设置,包括readAnyDatabase、readWriteAnyDatabase、userAdminAnyDatabase、dbAdminAnyDatabase、cluster相关的权限以及root权限等角色的说明和使用方法。 ... [详细]
  • mac php错误日志配置方法及错误级别修改
    本文介绍了在mac环境下配置php错误日志的方法,包括修改php.ini文件和httpd.conf文件的操作步骤。同时还介绍了如何修改错误级别,以及相应的错误级别参考链接。 ... [详细]
  • 使用Ubuntu中的Python获取浏览器历史记录原文: ... [详细]
  • 关于我们EMQ是一家全球领先的开源物联网基础设施软件供应商,服务新产业周期的IoT&5G、边缘计算与云计算市场,交付全球领先的开源物联网消息服务器和流处理数据 ... [详细]
  • 本文介绍了在Mac上搭建php环境后无法使用localhost连接mysql的问题,并通过将localhost替换为127.0.0.1或本机IP解决了该问题。文章解释了localhost和127.0.0.1的区别,指出了使用socket方式连接导致连接失败的原因。此外,还提供了相关链接供读者深入了解。 ... [详细]
  • FeatureRequestIsyourfeaturerequestrelatedtoaproblem?Please ... [详细]
  • 海马s5近光灯能否直接更换为H7?
    本文主要介绍了海马s5车型的近光灯是否可以直接更换为H7灯泡,并提供了完整的教程下载地址。此外,还详细讲解了DSP功能函数中的数据拷贝、数据填充和浮点数转换为定点数的相关内容。 ... [详细]
author-avatar
诺汐衹湜忲傻
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有