TSM故障处理
一故障背景
机房停电,重新启动磁带库和备份服务器后,备份不成功。或者是SAN网络出现调整做了新的配置后,导致备份不成功。
查看rman备份报错日志:
channel t2: starting piece 1 at 2011.08.24 01:00:19
RMAN-03009: failure of backup command on t1 channel at 08/24/2011 01:00:26
ORA-19502: write error on file "oracle_full_SBDB_1137363504_1279_1_759978018_20110824"", blockno 1 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
ANS1312E (RC12) Server media mount not possible
channel t1 disabled, job failed on it will be run on another channel
released channel: t1
released channel: t2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on t2 channel at 08/24/2011 01:00:26
ORA-19502: write error on file "oracle_full_SBDB_1137363504_1280_1_759978018_20110824"", blockno 1 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
ANS1312E (RC12) Server media mount not possible
Recovery Manager complete.
Wed Aug 24 01:00:29 BEIST 2011
二故障处理
分析: TSM备份软件在SAN网络出现调整,或异常停电等情况,容易出现磁带库设备在操作系统及TSM驱动程序中设备认不到,或设备名变化,造成和原来配置在备份软件中的设备名不一致。导致无法操作磁带库,因此备份失败。
处理:
1 进入TSM管理命令行,并检查错误信息:
C:\Program Files\Tivoli\TSM\server\tsmdiag\dsmadmc.exe
TSMSERVER> q actlog begintime=01:00:00 search=failed
出现磁带库初始化错误信息和装载磁带错误
2 检查操作系统设备状态和TSM管理控制台设备状态,并且对比TSM配置中的设备名是否一致。
2.1 确定操作系统中 设备管理器 可以查看到 磁带机设备和 媒体更换器设备 ,并且状态正常.
2.2 在TSM management console 查看 tsm device driver 有磁带库和磁带设备: lb1.1.0.3 和mt0.0.0.3 , mt1.0.0.3
2.3 并对比之前配置的设备名是否不一致.
上图在未处理前显示的设备是mt1.0.0.3,mt2.0.0.3和lb2.1.0.3 (即TSM原有配置)
与TSM management console所显示的设备名lb1.1.0.3 和mt0.0.0.3 , mt1.0.0.3不一致。
3 重新配置TSM中的设备
3.1删除原有设备
delete path tsmserver drive1 srctype=server desttype=drive library=ts3100lib
delete path tsmserver drive2 srctype=server desttype=drive library=ts3100lib
删除旧磁带机设备名
delete drive ts3100lib drive1
delete drive ts3100lib drive2
如果删除旧磁带机设备名的时候报错,先q path查看 如果On-Line是YES,先运行以下内容然后再删除旧磁带机设备名
delete path P570A_AGENT drive1 srctype=server desttype=drive library=ts3100lib
delete path P570A_AGENT drive2 srctype=server desttype=drive library=ts3100lib
删除旧磁带库路径
delete path tsmserver ts3100lib srctype=server desttype=library
删除旧磁带库设备名
delete library ts3100lib
3.2 重新配置设备
重新定义磁带库设备和路径: (顺序和删除相反)
在TSM management console 查看 tsm device driver 有磁带库和磁带设备: lb1.1.0.3 和mt0.0.0.3 , mt1.0.0.3 (接下来的定义中,要定义成一致)
define library ts3100lib libtype=scsi shared=yes
define path tsmserver ts3100lib srctype=server desttype=library device=lb1.1.0.3
重新定义磁带机设备
define drive ts3100lib drive1
define path tsmserver drive1 srctype=server desttype=drive library=ts3100lib device=mt0.0.0.3
define drive ts3100lib drive2
define path tsmserver drive2 srctype=server desttype=drive library=ts3100lib device=mt1.0.0.3
4 检查磁带库卷和检入卷
q libvol
checkin libvolume ts3100lib search=yes checklabel=barcode status=private
(最后一步,如果q libvol显示出不来,就先运行下面这条)
==============================================================
一、描述:
对于windows平台,经常出现在机器重启后,带库设备名称变化,导致tsm中配置不匹配,备份报错等问题
比如 原来是 mt0.0.0.3
重启后 变成了 mt1.0.1.3
二、原因:
tsm 对于设备命令有不同的方式,在aix平台下,比较智能
但在windows 平台下, tape driver 特有功能是可以 在重启后,自动变化设备名称的。(具体原因我不知道,别问我为什么哦)
三、解决方法:
在注册表里面创建一个key
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ibmtp2k3\PersistentNaming
赋值 DWORD value=1
重启机器
在HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ibmtp2k3 子键下查看 最新的wwn 序号
===========================================================
AIX系统 TSM问题
=================================================================================================
tsm: TSM>QUERY DRIVE 3584LIB
Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
3584LIB DRIVE3 LTO Yes
3584LIB DRIVE4 LTO Yes
tsm: TSM>QUERY DRIVE 3584LIB F=D
Library Name: 3584LIB
Drive Name: DRIVE3
Device Type: LTO
On-Line: Yes
Read Formats: ULTRIUM4C,ULTRIUM4,ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2
Write Formats: ULTRIUM4C,ULTRIUM4,ULTRIUM3C,ULTRIUM3
Element: 258
Drive State: LOADED
Volume Name: A00123L4
Allocated to: STA_MISDB1
WWN: 500507630F1C6602
Serial Number: 0007866410
Last Update by (administrator): ADMIN
Last Update Date/Time: 10/18/11 18:02:16
Cleaning Frequency (Gigabytes/ASNEEDED/NONE): ASNEEDED
Library Name: 3584LIB
Drive Name: DRIVE4
Device Type: LTO
On-Line: Yes
Read Formats: ULTRIUM4C,ULTRIUM4,ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2
Write Formats: ULTRIUM4C,ULTRIUM4,ULTRIUM3C,ULTRIUM3
Element: 257
Drive State: LOADED
Volume Name: A00124L4
Allocated to: STA_MISDB1
WWN: 500507630F1C6601
Serial Number: 0007866210
Last Update by (administrator): ADMIN
Last Update Date/Time: 10/18/11 18:02:28
Cleaning Frequency (Gigabytes/ASNEEDED/NONE): ASNEEDED
tsm: TSM>q path f=d
Source Name: STA_MISDB1
Source Type: SERVER
Destination Name: DRIVE3
Destination Type: DRIVE
Library: 3584LIB
Node Name:
Device: /dev/rmt2
External Manager:
LUN:
Initiator: 0
Directory:
On-Line: Yes
Last Update by (administrator): ADMIN
Last Update Date/Time: 10/19/11 11:49:07
Source Name: STA_MISDB1
Source Type: SERVER
Destination Name: DRIVE4
Destination Type: DRIVE
Library: 3584LIB
Node Name:
Device: /dev/rmt0
External Manager:
LUN:
Initiator: 0
Directory:
On-Line: Yes
Last Update by (administrator): ADMIN
Last Update Date/Time: 10/19/11 11:49:27
# lscfg -vl rmt0
rmt0 U787B.001.DNWDA9F-P1-C3-T1-W500507630F5C6601-L0 IBM 3580 Ultrium Tape Drive (FCP)
Manufacturer................IBM
Machine Type and Model......ULT3580-TD4
Serial Number...............0007866210
Device Specific.(FW)........7A31
#lscfg -vl rmt2
rmt2 U787B.001.DNWDA9F-P1-C4-T1-W500507630F5C6602-L0 IBM 3580 Ultrium Tape Drive (FCP)
Manufacturer................IBM
Machine Type and Model......ULT3580-TD4
Serial Number...............0007866410
Device Specific.(FW)........7A31
问题:由于SAN网络割接,导致主机rmt设备名称导致变化
解决:将变化的设备名用下面命令更新,注意新的rmt设备名(注意红色字体的对应关系)要与DRIVE的WWN号相对应
注意DRIVE3对应rmt2
DRIVE4对应rmt0
由红色部分的WWN号与序列号可以找出对应关系(如果对应关系不对,发起备份时,命令query path显示的路径会offine)
UPDATE PATH STA_MISAPP1 DRIVE3 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=3584LIB DEVICE=/dev/rmt2
UPDATE PATH STA_MISAPP1 DRIVE4 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=3584LIB DEVICE=/dev/rmt0