热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

etcoraclelastgasp,Oracle11gR2RAC节点crash故障分析

环境:AIX7100Oracle11gR2RAC详细版本:11.2.0.4现象:节点2CRSHANG住了,CRSCTL命令完

环境:AIX 7100

Oracle 11gR2 RAC

详细版本:11.2.0.4

现象:

节点2 CRS HANG住了,CRSCTL命令完全没反应,直接干掉CRS进程主机重启后,但VIP没飘到节点1

分析思路;

1、DB下的alert日志及相关trace日志。

2. 查看所有节点的"errpt -a"的输出。

3. 查看发生问题时所有节点的GI日志:

/log//alert*.log

/log//crsd/crsd.log

/log//cssd/ocssd.log

/log//agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log

/log//agent/ohasd/oracssdagent_root/oracssdagent_root.log

/etc/oracle/lastgasp/*, or /var/opt/oracle/lastgasp/*(If have)

注:如果是CRS发起的重启主机会在/etc/oracle/lastgasp/目录下的文件中添加一条记录。

4.  查看发生问题时所有节点的LMON, LMS*,LMD0 trace files。

5. 查看发生问题时所有节点OSW的所有输出。

--------------------------------------分割线 --------------------------------------

--------------------------------------分割线 --------------------------------------

详细分析过程如下:

节点1DB的alert日志:

Tue Mar 25 12:59:07 2014

Thread 1 advanced to log sequence 245 (LGWR switch)

Current log# 2 seq# 245 mem# 0: +SYSDG/dbracdb/onlinelog/group_2.264.840562709

Current log# 2 seq# 245 mem# 1: +SYSDG/dbracdb/onlinelog/group_2.265.840562727

Tue Mar 25 12:59:20 2014

Archived Log entry 315 added for thread 1 sequence 244 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:14:54 2014

IPC Send timeout detected. Sender: ospid 6160700 [oracle@dbrac1 (LMS0)]

Receiver: inst 2 binc 291585594 ospid 11010320

IPC Send timeout to 2.1 inc 50 for msg type 65518 from opid 12

Tue Mar 25 13:14:59 2014

Communications reconfiguration: instance_number 2

Tue Mar 25 13:15:01 2014

IPC Send timeout detected. Sender: ospid 12452050 [oracle@dbrac1 (LMS1)]

Receiver: inst 2 binc 291585600 ospid 11534636

IPC Send timeout to 2.2 inc 50 for msg type 65518 from opid 13

Tue Mar 25 13:15:22 2014

IPC Send timeout detected. Sender: ospid 10682630 [oracle@dbrac1 (TNS V1-V3)]

Receiver: inst 2 binc 50 ospid 6095056

Tue Mar 25 13:15:25 2014

Detected an inconsistent instance membership by instance 1

Evicting instance 2 from cluster

Waiting for instances to leave: 2

Tue Mar 25 13:15:26 2014

Dumping diagnostic data in directory=[cdmp_20140325131526], requested by (instance=2, osid=8192018 (LMD0)), summary=[abnormal instance termination].

Tue Mar 25 13:15:42 2014

Reconfiguration started (old inc 50, new inc 54)

List of instances:

1 (myinst: 1)

...

Tue Mar 25 13:15:52 2014

Archived Log entry 316 added for thread 2 sequence 114 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:15:53 2014

ARC3: Archiving disabled thread 2 sequence 115

Archived Log entry 317 added for thread 2 sequence 115 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:16:37 2014

Thread 1 advanced to log sequence 246 (LGWR switch)

Current log# 3 seq# 246 mem# 0: +SYSDG/dbracdb/onlinelog/group_3.266.840562735

Current log# 3 seq# 246 mem# 1: +SYSDG/dbracdb/onlinelog/group_3.267.840562747

Tue Mar 25 13:16:46 2014

Decreasing number of real time LMS from 2 to 0

Tue Mar 25 13:16:51 2014

Archived Log entry 318 added for thread 1 sequence 245 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:20:50 2014

IPC Send timeout detected. Sender: ospid 9306248 [oracle@dbrac1 (PING)]

Receiver: inst 2 binc 291585377 ospid 2687058

Tue Mar 25 13:30:08 2014

Thread 1 advanced to log sequence 247 (LGWR switch)

Current log# 1 seq# 247 mem# 0: +SYSDG/dbracdb/onlinelog/group_1.262.840562653

Current log# 1 seq# 247 mem# 1: +SYSDG/dbracdb/onlinelog/group_1.263.840562689

Tue Mar 25 13:30:20 2014

Archived Log entry 319 added for thread 1 sequence 246 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:45:23 2014

Thread 1 advanced to log sequence 248 (LGWR switch)

Current log# 2 seq# 248 mem# 0: +SYSDG/dbracdb/onlinelog/group_2.264.840562709

Current log# 2 seq# 248 mem# 1: +SYSDG/dbracdb/onlinelog/group_2.265.840562727

节点2DB的alert日志:

Tue Mar 25 12:07:15 2014

Archived Log entry 309 added for thread 2 sequence 112 ID 0xffffffff82080958 dest 1:

Tue Mar 25 12:22:22 2014

Dumping diagnostic data in directory=[cdmp_20140325122222], requested by (instance=1, osid=7012828), summary=[incident=384673].

Tue Mar 25 12:45:21 2014

Thread 2 advanced to log sequence 114 (LGWR switch)

Current log# 6 seq# 114 mem# 0: +SYSDG/dbracdb/onlinelog/group_6.274.840563009

Current log# 6 seq# 114 mem# 1: +SYSDG/dbracdb/onlinelog/group_6.275.840563017

Tue Mar 25 12:45:22 2014

Archived Log entry 313 added for thread 2 sequence 113 ID 0xffffffff82080958 dest 1:

Tue Mar 25 13:14:57 2014

IPC Send timeout detected. Receiver ospid 11010320

Tue Mar 25 13:14:57 2014

Errors in file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms0_11010320.trc:

IPC Send timeout detected. Receiver ospid 11534636 [

Tue Mar 25 13:15:01 2014

Errors in file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms1_11534636.trc:

Tue Mar 25 13:15:25 2014

LMS0 (ospid: 11010320) has detected no messaging activity from instance 1

LMS0 (ospid: 11010320) issues an IMR to resolve the situation

Please check LMS0 trace file for more detail.

Tue Mar 25 13:15:25 2014

Suppressed nested communications reconfiguration: instance_number 1

Detected an inconsistent instance membership by instance 1

Tue Mar 25 13:15:25 2014

Received an instance abort message from instance 1

Please check instance 1 alert and LMON trace files for detail.

LMD0 (ospid: 8192018): terminating the instance due to error 481

Tue Mar 25 13:15:26 2014

ORA-1092 : opitsk aborting process

Tue Mar 25 13:15:29 2014

System state dump requested by (instance=2, osid=8192018 (LMD0)), summary=[abnormal instance termination].

System State dumped to trace file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_diag_9699724_20140325131529.trc

Instance terminated by LMD0, pid = 8192018

节点1的OSW PRVTNET日志:

zzz ***Tue Mar 25 13:12:19 BEIST 2014

trying to get source for 192.168.100.1

source should be 192.168.100.1

traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms

trying to get source for 192.168.100.2

source should be 192.168.100.1

traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  dbrac2-priv (192.168.100.2)  1 ms  0 ms *

zzz ***Warning. Traceroute response is spanning snapshot intervals.

zzz ***Tue Mar 25 13:12:31 BEIST 2014

trying to get source for 192.168.100.1

source should be 192.168.100.1

traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms

trying to get source for 192.168.100.2

source should be 192.168.100.1

traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  * * *

2  * * *

3  * dbrac2-priv (192.168.100.2)  0 ms *

zzz ***Warning. Traceroute response is spanning snapshot intervals.

zzz ***Tue Mar 25 13:13:17 BEIST 2014

trying to get source for 192.168.100.1

source should be 192.168.100.1

traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms

trying to get source for 192.168.100.2

source should be 192.168.100.1

traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  * * *

2  * * *

3  dbrac2-priv (192.168.100.2)  0 ms * *

zzz ***Warning. Traceroute response is spanning snapshot intervals.

zzz ***Tue Mar 25 13:14:04 BEIST 2014

trying to get source for 192.168.100.1

source should be 192.168.100.1

traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms

trying to get source for 192.168.100.2

source should be 192.168.100.1

traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU = 1500

1  * * * <&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;注&#xff1a;*代表traceroute不成功&#xff0c;3个*代表执行了3次网络交互

2  * * *

3  * * *

4  * * *

5  * * *

6  * * *

7  * * *

8  dbrac2-priv (192.168.100.2)  0 ms  0 ms *

zzz ***Warning. Traceroute response is spanning snapshot intervals.

zzz ***Tue Mar 25 13:16:01 BEIST 2014  <&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;This snapshot is taken after 2 mins, OSW gap happened.

trying to get source for 192.168.100.1

source should be 192.168.100.1

traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU &#61; 1500

1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms

trying to get source for 192.168.100.2

source should be 192.168.100.1

traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max

outgoing MTU &#61; 1500

1  * dbrac2-priv (192.168.100.2)  0 ms  0 ms

0b1331709591d260c1c78e86d0c51c18.png



推荐阅读
  • CSS3选择器的使用方法详解,提高Web开发效率和精准度
    本文详细介绍了CSS3新增的选择器方法,包括属性选择器的使用。通过CSS3选择器,可以提高Web开发的效率和精准度,使得查找元素更加方便和快捷。同时,本文还对属性选择器的各种用法进行了详细解释,并给出了相应的代码示例。通过学习本文,读者可以更好地掌握CSS3选择器的使用方法,提升自己的Web开发能力。 ... [详细]
  • 本文介绍了Oracle数据库中tnsnames.ora文件的作用和配置方法。tnsnames.ora文件在数据库启动过程中会被读取,用于解析LOCAL_LISTENER,并且与侦听无关。文章还提供了配置LOCAL_LISTENER和1522端口的示例,并展示了listener.ora文件的内容。 ... [详细]
  • Python正则表达式学习记录及常用方法
    本文记录了学习Python正则表达式的过程,介绍了re模块的常用方法re.search,并解释了rawstring的作用。正则表达式是一种方便检查字符串匹配模式的工具,通过本文的学习可以掌握Python中使用正则表达式的基本方法。 ... [详细]
  • 本文介绍了一个在线急等问题解决方法,即如何统计数据库中某个字段下的所有数据,并将结果显示在文本框里。作者提到了自己是一个菜鸟,希望能够得到帮助。作者使用的是ACCESS数据库,并且给出了一个例子,希望得到的结果是560。作者还提到自己已经尝试了使用"select sum(字段2) from 表名"的语句,得到的结果是650,但不知道如何得到560。希望能够得到解决方案。 ... [详细]
  • 本文讨论了Kotlin中扩展函数的一些惯用用法以及其合理性。作者认为在某些情况下,定义扩展函数没有意义,但官方的编码约定支持这种方式。文章还介绍了在类之外定义扩展函数的具体用法,并讨论了避免使用扩展函数的边缘情况。作者提出了对于扩展函数的合理性的质疑,并给出了自己的反驳。最后,文章强调了在编写Kotlin代码时可以自由地使用扩展函数的重要性。 ... [详细]
  • 2018年人工智能大数据的爆发,学Java还是Python?
    本文介绍了2018年人工智能大数据的爆发以及学习Java和Python的相关知识。在人工智能和大数据时代,Java和Python这两门编程语言都很优秀且火爆。选择学习哪门语言要根据个人兴趣爱好来决定。Python是一门拥有简洁语法的高级编程语言,容易上手。其特色之一是强制使用空白符作为语句缩进,使得新手可以快速上手。目前,Python在人工智能领域有着广泛的应用。如果对Java、Python或大数据感兴趣,欢迎加入qq群458345782。 ... [详细]
  • 生成式对抗网络模型综述摘要生成式对抗网络模型(GAN)是基于深度学习的一种强大的生成模型,可以应用于计算机视觉、自然语言处理、半监督学习等重要领域。生成式对抗网络 ... [详细]
  • Echarts图表重复加载、axis重复多次请求问题解决记录
    文章目录1.需求描述2.问题描述正常状态:问题状态:3.解决方法1.需求描述使用Echats实现了一个中国地图:通过选择查询周期&#x ... [详细]
  • 本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法,并给出了问题的解析和解决方法。详细介绍了问题的背景和规则,并给出了相应的算法解析和实现步骤。通过本文的解析,读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]
  • 知识图谱——机器大脑中的知识库
    本文介绍了知识图谱在机器大脑中的应用,以及搜索引擎在知识图谱方面的发展。以谷歌知识图谱为例,说明了知识图谱的智能化特点。通过搜索引擎用户可以获取更加智能化的答案,如搜索关键词"Marie Curie",会得到居里夫人的详细信息以及与之相关的历史人物。知识图谱的出现引起了搜索引擎行业的变革,不仅美国的微软必应,中国的百度、搜狗等搜索引擎公司也纷纷推出了自己的知识图谱。 ... [详细]
  • 推荐系统遇上深度学习(十七)详解推荐系统中的常用评测指标
    原创:石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值, ... [详细]
  • sklearn数据集库中的常用数据集类型介绍
    本文介绍了sklearn数据集库中常用的数据集类型,包括玩具数据集和样本生成器。其中详细介绍了波士顿房价数据集,包含了波士顿506处房屋的13种不同特征以及房屋价格,适用于回归任务。 ... [详细]
  • 在project.properties添加#Projecttarget.targetandroid-19android.library.reference.1..Sliding ... [详细]
  • 标题: ... [详细]
  • 本文介绍了在处理不规则数据时如何使用Python自动提取文本中的时间日期,包括使用dateutil.parser模块统一日期字符串格式和使用datefinder模块提取日期。同时,还介绍了一段使用正则表达式的代码,可以支持中文日期和一些特殊的时间识别,例如'2012年12月12日'、'3小时前'、'在2012/12/13哈哈'等。 ... [详细]
author-avatar
手机用户2602929123
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有