Oracle RAC由OCR和OLR保存集群的核心配置信息。
OLR文件存放路径:
[root@exn1 ~]# ls etc/oracle/olr.loc
/etc/oracle/olr.loc
[root@exn1 ~]# cat etc/oracle/olr.loc
olrconfig_loc=/u01/app/12.2/cdata/exn1.olr
crs_home=/u01/app/12.2
[root@exn1 ~]# ls u01/app/12.2/cdata/exn1.olr
/u01/app/12.2/cdata/exn1.olr
[root@exn1 ~]#
查看OLR文件内容:
[grid@exn1 ~]$ ocrconfig -local -showbackup
exn1 2019/04/17 14:23:19 u01/app/12.2/cdata/exn1/backup_20190417_142319.olr 0
[grid@exn1 ~]$ echo $ORACLE_HOME/bin
/u01/app/12.2/bin
[grid@exn1 ~]$ exit
[root@exn1 ~]# cd u01/app/12.2/bin
[root@exn1 bin]# ./ocrconfig -local -manualbackup
exn1 2019/10/25 04:55:35 u01/app/12.2/cdata/exn1/backup_20191025_045535.olr 304112466
exn1 2019/04/17 14:23:19 /u01/app/12.2/cdata/exn1/backup_20190417_142319.olr 0
[root@exn1 bin]# ./ocrdump -local -backupfile /u01/app/12.2/cdata/exn1/backup_20191025_045535.olr
执行完dump命令后,会在当前目录生成OCRDUMPFILE文件,可直接more查看。
生产中可能遇见各种各样的情况,本文将展示OLR文件异常损坏丢失如何解决。
环境:
数据库:4节点RAC 版本12.2.0.1
操作系统:Centos7.6
模拟节点1OLR丢失:
[root@exn1 ~]# cat etc/oracle/olr.loc
olrconfig_loc=/u01/app/12.2/cdata/exn1.olr
crs_home=/u01/app/12.2
[root@exn1 bin]# mv /u01/app/12.2/cdata/exn1.olr /tmp/exn1.olr
[root@exn1 bin]# reboot
重启后现象:
[grid@exn1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[grid@exn1 ~]$ crsctl stat res -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
[grid@exn1 ~]$ crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
[grid@exn1 ~]$ ps -ef | grep ohas | grep -v grep
root 9927 1 0 05:03 ? 00:00:00 bin/sh etc/init.d/init.ohasd run >/dev/null 2>&1 null
[grid@exn1 ~]$ ps -ef | grep crs | grep -v grep
[grid@exn1 ~]$
如上,可知ohasd已经启动,但是crsd无法与ohasd通信,导致集群启动失败
寻找最近更新的日志:
[grid@exn1 ~]$ cd /u01/app/grid/diag/crs/exn1/crs/trace
[grid@exn1 trace]$ ls -lrt
total 201
...
-rw-rw---- 1 root oinstall 921 Oct 25 05:55 crsctl_9787.trm
-rw-rw---- 1 root oinstall 658 Oct 25 05:55 crsctl_9787.trc
[grid@exn1 trace]$ cat crsctl_9787.trc
Trace file /u01/app/grid/diag/crs/exn1/crs/trace/crsctl_9787.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.
2019-10-25 05:55:46.950 : OCROSD:1888452736: utopen:6m': failed in stat OCR file/disk /u01/app/12.2/cdata/exn1.olr, errno=2, os err string=No such file or directory
2019-10-25 05:55:46.950 : OCROSD:1888452736: utopen:7: failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2019-10-25 05:55:46.950 : OCRRAW:1888452736: proprinit: Could not open raw device
2019-10-25 05:55:46.
如上提示,提示exn1.olr文件丢失。
定位到问题后,开始修复。
[root@exn1 ~]# cd /u01/app/12.2/bin
[root@exn1 bin]# ./crsctl stop has
[root@exn1 bin]# ./ocrconfig -local -restore /u01/app/12.2/cdata/exn1/backup_20191025_045535.olr
PROTL-35: The configured OLR location is not accessible
[root@exn1 bin]# cat /etc/oracle/olr.loc
olrconfig_loc=/u01/app/12.2/cdata/exn1.olr
crs_home=/u01/app/12.2
[root@exn1 bin]# ls -lrt /u01/app/12.2/cdata/
total 0
drwxr-xr-x 2 grid oinstall 6 Jan 27 2017 localhost
drwxrwxr-x 2 grid oinstall 6 Apr 17 2019 kevin
drwxr-xr-x 2 grid oinstall 108 Oct 25 04:55 exn1
[root@exn1 bin]# touch /u01/app/12.2/cdata/exn1.olr
[root@exn1 bin]# chmod 600 /u01/app/12.2/cdata/exn1.olr
[root@exn1 bin]# chown grid:oinstall /u01/app/12.2/cdata/exn1.olr
[root@exn1 bin]# ./ocrconfig -local -restore /u01/app/12.2/cdata/exn1/backup_20191025_045535.olr
[root@exn1 bin]# ./ocrcheck -local
Status of Oracle Local Registry is as follows :
Version : 4
Total space (kbytes) : 409568
Used space (kbytes) : 1092
Available space (kbytes) : 408476
ID : 1740932837
Device/File Name : /u01/app/12.2/cdata/exn1.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check succeeded
[root@exn1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@exn1 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@exn1 bin]#
根据OLR备份进行恢复,注意OLR节点间不通用,不能拷贝其他节点的OLR到问题节点。恢复完成后重新启动集群。
参考:
1.12C官方文档:《Administration and Deployment Guide》中Managing Oracle Cluster Registry and Voting Files 章节
2. MOS:How to backup or restore OLR in 11.2/12c Grid Infrastructure (Doc ID 1193643.1)
日进一步,希望对您有用,期待您的关注!