标签归档:asm 恢复

asm磁盘头全部损坏数据0丢失恢复

接到朋友反馈说他们公司的10.2.0.4(无磁盘头备份)asm 磁盘头都损坏了,确定是被人恶意dd掉了磁盘头的1k,他通过查询发过来结果如下
asm_disk_header


分析alert日志,确定磁盘组和磁盘信息
asm mount data磁盘组报错

Sun Apr 16 21:39:31 2017
NOTE: cache dismounting group 2/0x3F94036B (DATA) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Sun Apr 16 21:39:31 2017
ERROR: no PST quorum in group 3: required 2, found 0

data磁盘组和磁盘信息

Mon Mar 20 16:21:59 2017
NOTE: Hbeat: instance not first (grp 3)
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/raw/raw21
Mon Mar 20 16:21:59 2017
NOTE: F1X0 found on disk 2 fcn 0.47624333
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/raw/raw22
NOTE: cache opening disk 4 of grp 2: DATA_0004 path:/dev/raw/raw23
NOTE: cache opening disk 5 of grp 2: DATA_0005 path:/dev/raw/raw24
NOTE: F1X0 found on disk 5 fcn 0.47624333
NOTE: cache opening disk 6 of grp 2: DATA_0006 path:/dev/raw/raw26
NOTE: cache opening disk 7 of grp 2: DATA_0007 path:/dev/raw/raw25
NOTE: F1X0 found on disk 7 fcn 0.47624333
NOTE: cache mounting (not first) group 2/0x01B869DC (DATA)
Mon Mar 20 16:21:59 2017
kjbdomatt send to node 1
Mon Mar 20 16:21:59 2017
NOTE: attached to recovery domain 2
Mon Mar 20 16:21:59 2017
NOTE: opening chunk 2 at fcn 0.201560874 ABA 
NOTE: seq=614 blk=4144 
Mon Mar 20 16:21:59 2017
NOTE: cache mounting group 2/0x01B869DC (DATA) succeeded
SUCCESS: diskgroup DATA was mounted

最后一次正常mount是使用了raw21-raw26的裸设备为data磁盘组,但是这里从DATA_002开始,表明很可能最初了两个asm disk被删除,继续分析alert日志

Mon Oct 15 01:53:16 2012
 CREATE DISKGROUP DATA Normal REDUNDANCY  DISK '/dev/raw/raw6' SIZE 1144409M ,
'/dev/raw/raw7' SIZE 1144409M   

Sat Dec 27 22:41:39 2014
 alter diskgroup data add disk '/dev/raw/raw21' 

Sat Dec 27 22:41:54 2014
alter diskgroup data add disk '/dev/raw/raw22' 

Sat Dec 27 22:42:14 2014
 alter diskgroup data add disk '/dev/raw/raw23' 

Sat Dec 27 22:42:31 2014
 alter diskgroup data add disk '/dev/raw/raw24' 


Sat Dec 27 22:42:51 2014
 alter diskgroup data add disk '/dev/raw/raw26' 

Sat Dec 27 22:43:10 2014
alter diskgroup data add disk '/dev/raw/raw25' 

Mon Dec 29 17:55:07 2014
 alter diskgroup data drop disk 'DATA_0000' 

Tue Dec 30 03:14:42 2014
 alter diskgroup data drop disk 'DATA_0001'  

kfed确认磁盘头损坏情况
通过kfed分析dd出来的磁盘头发现每个磁盘头都一样,第一个block损坏

[oracle@rac1 xifenfei]$ kfed read raw22
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F21AF427400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[oracle@rac1 xifenfei]$ kfed read raw22 blkn=2|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                  2184525105 ; 0x00c: 0x82353531
kfbh.fcn.base:                 47625389 ; 0x010: 0x02d6b4ad
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                    448 ; 0x004: 0x01c0
kfdatb10.ub2pad:                      0 ; 0x006: 0x0000
kfdatb10.auinfo[0].link.next:         8 ; 0x008: 0x0008
kfdatb10.auinfo[0].link.prev:         8 ; 0x00a: 0x0008
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:           448 ; 0x00e: 0x01c0
kfdatb10.auinfo[1].link.next:        16 ; 0x010: 0x0010
kfdatb10.auinfo[1].link.prev:        16 ; 0x012: 0x0010
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:        24 ; 0x018: 0x0018
kfdatb10.auinfo[2].link.prev:        24 ; 0x01a: 0x0018
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:        32 ; 0x020: 0x0020
kfdatb10.auinfo[3].link.prev:        32 ; 0x022: 0x0020

恢复思路
确定磁盘是只被干掉了第一个block,但是由于asm 是10.2.0.4的,没有asm disk header的备份,因此也只能自己去人工kfed修复.但是考虑到该case中所有的asm disk header 全部丢失,无任何参考,完全修复比较麻烦,另外这个库也比较小,考虑修复asm disk header 关键部位,然后通过工具直接拷贝出来数据文件,在文件系统中open库的思路.主要需要恢复磁盘头基本信息(diskname,disksize,disknum,ausize,blocksize,file directory等)

通过kfed找出来file directory

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  2363360058 ; 0x00c: 0x8cde033a
kfbh.fcn.base:                 48245591 ; 0x010: 0x02e02b57
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:                   1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                 1048576 ; 0x010: 0x00100000
kfffdb.xtntcnt:                       3 ; 0x014: 0x00000003
kfffdb.xtnteof:                       3 ; 0x018: 0x00000003
kfffdb.blkSize:                    4096 ; 0x01c: 0x00001000
kfffdb.flags:                        65 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=1 A=0
kfffdb.fileType:                     15 ; 0x021: 0x0f
kfffdb.dXrs:                         19 ; 0x022: SCHE=0x1 NUMB=0x3

通过kfed找出来disk directory

kfffde[0].xptr.au:                    4 ; 0x4a0: 0x00000004
kfffde[0].xptr.disk:                  7 ; 0x4a4: 0x0007
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  41 ; 0x4a7: 0x29
kfffde[1].xptr.au:                17405 ; 0x4a8: 0x000043fd
kfffde[1].xptr.disk:                  6 ; 0x4ac: 0x0006
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk:                 146 ; 0x4af: 0x92
kfffde[2].xptr.au:               330031 ; 0x4b0: 0x0005092f
kfffde[2].xptr.disk:                  4 ; 0x4b4: 0x0004
kfffde[2].xptr.flags:                 0 ; 0x4b6: L=0 E=0 D=0 S=0
kfffde[2].xptr.chk:                  13 ; 0x4b7: 0x0d


kfddde[2].entry.incarn:               1 ; 0x3a4: A=1 NUMM=0x0
kfddde[2].entry.hash:                 2 ; 0x3a8: 0x00000002
kfddde[2].entry.refer.number:4294967295 ; 0x3ac: 0xffffffff
kfddde[2].entry.refer.incarn:         0 ; 0x3b0: A=0 NUMM=0x0
kfddde[2].dsknum:                     2 ; 0x3b4: 0x0002
kfddde[2].state:                      2 ; 0x3b6: KFDSTA_NORMAL
kfddde[2].ddchgfl:                    0 ; 0x3b7: 0x00
kfddde[2].dskname:            DATA_0002 ; 0x3b8: length=9
kfddde[2].fgname:             DATA_0002 ; 0x3d8: length=9
kfddde[2].crestmp.hi:          33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[2].crestmp.lo:        2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfddde[2].failstmp.hi:                0 ; 0x400: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[2].failstmp.lo:                0 ; 0x404: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[2].timer:                      0 ; 0x408: 0x00000000
kfddde[2].size:                 1258291 ; 0x40c: 0x00133333
kfddde[2].srRloc.super.hiStart:       0 ; 0x410: 0x00000000
kfddde[2].srRloc.super.loStart:       0 ; 0x414: 0x00000000
kfddde[2].srRloc.super.length:        0 ; 0x418: 0x00000000
kfddde[2].srRloc.incarn:              0 ; 0x41c: 0x00000000
kfddde[2].dskrprtm:                   0 ; 0x420: 0x00000000
kfddde[2].start0:                     0 ; 0x424: 0x00000000
kfddde[2].size0:                1258291 ; 0x428: 0x00133333
kfddde[2].used0:                1258229 ; 0x42c: 0x001332f5
kfddde[2].slot:                       0 ; 0x430: 0x00000000


kfddde[3].entry.incarn:               1 ; 0x564: A=1 NUMM=0x0
kfddde[3].entry.hash:                 3 ; 0x568: 0x00000003
kfddde[3].entry.refer.number:4294967295 ; 0x56c: 0xffffffff
kfddde[3].entry.refer.incarn:         0 ; 0x570: A=0 NUMM=0x0
kfddde[3].dsknum:                     3 ; 0x574: 0x0003
kfddde[3].state:                      2 ; 0x576: KFDSTA_NORMAL
kfddde[3].ddchgfl:                    0 ; 0x577: 0x00
kfddde[3].dskname:            DATA_0003 ; 0x578: length=9
kfddde[3].fgname:             DATA_0003 ; 0x598: length=9
kfddde[3].crestmp.hi:          33010550 ; 0x5b8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[3].crestmp.lo:        2811397120 ; 0x5bc: USEC=0x0 MSEC=0xa1 SECS=0x39 MINS=0x29
kfddde[3].failstmp.hi:                0 ; 0x5c0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[3].failstmp.lo:                0 ; 0x5c4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[3].timer:                      0 ; 0x5c8: 0x00000000
kfddde[3].size:                 1258291 ; 0x5cc: 0x00133333
kfddde[3].srRloc.super.hiStart:       0 ; 0x5d0: 0x00000000
kfddde[3].srRloc.super.loStart:       0 ; 0x5d4: 0x00000000
kfddde[3].srRloc.super.length:        0 ; 0x5d8: 0x00000000
kfddde[3].srRloc.incarn:              0 ; 0x5dc: 0x00000000
kfddde[3].dskrprtm:                   0 ; 0x5e0: 0x00000000
kfddde[3].start0:                     0 ; 0x5e4: 0x00000000
kfddde[3].size0:                1258291 ; 0x5e8: 0x00133333
kfddde[3].used0:                1258128 ; 0x5ec: 0x00133290
kfddde[3].slot:                       0 ; 0x5f0: 0x00000000

kfddde[4].entry.incarn:               1 ; 0x724: A=1 NUMM=0x0
kfddde[4].entry.hash:                 4 ; 0x728: 0x00000004
kfddde[4].entry.refer.number:4294967295 ; 0x72c: 0xffffffff
kfddde[4].entry.refer.incarn:         0 ; 0x730: A=0 NUMM=0x0
kfddde[4].dsknum:                     4 ; 0x734: 0x0004
kfddde[4].state:                      2 ; 0x736: KFDSTA_NORMAL
kfddde[4].ddchgfl:                    0 ; 0x737: 0x00
kfddde[4].dskname:            DATA_0004 ; 0x738: length=9
kfddde[4].fgname:             DATA_0004 ; 0x758: length=9
kfddde[4].crestmp.hi:          33010550 ; 0x778: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[4].crestmp.lo:        2834565120 ; 0x77c: USEC=0x0 MSEC=0x102 SECS=0xf MINS=0x2a
kfddde[4].failstmp.hi:                0 ; 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[4].failstmp.lo:                0 ; 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[4].timer:                      0 ; 0x788: 0x00000000
kfddde[4].size:                 1258291 ; 0x78c: 0x00133333
kfddde[4].srRloc.super.hiStart:       0 ; 0x790: 0x00000000
kfddde[4].srRloc.super.loStart:       0 ; 0x794: 0x00000000
kfddde[4].srRloc.super.length:        0 ; 0x798: 0x00000000
kfddde[4].srRloc.incarn:              0 ; 0x79c: 0x00000000
kfddde[4].dskrprtm:                   0 ; 0x7a0: 0x00000000
kfddde[4].start0:                     0 ; 0x7a4: 0x00000000
kfddde[4].size0:                1258291 ; 0x7a8: 0x00133333
kfddde[4].used0:                1258291 ; 0x7ac: 0x00133333
kfddde[4].slot:                       0 ; 0x7b0: 0x00000000


kfddde[5].entry.incarn:               1 ; 0x8e4: A=1 NUMM=0x0
kfddde[5].entry.hash:                 5 ; 0x8e8: 0x00000005
kfddde[5].entry.refer.number:4294967295 ; 0x8ec: 0xffffffff
kfddde[5].entry.refer.incarn:         0 ; 0x8f0: A=0 NUMM=0x0
kfddde[5].dsknum:                     5 ; 0x8f4: 0x0005
kfddde[5].state:                      2 ; 0x8f6: KFDSTA_NORMAL
kfddde[5].ddchgfl:                    0 ; 0x8f7: 0x00
kfddde[5].dskname:            DATA_0005 ; 0x8f8: length=9
kfddde[5].fgname:             DATA_0005 ; 0x918: length=9
kfddde[5].crestmp.hi:          33010550 ; 0x938: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[5].crestmp.lo:        2853560320 ; 0x93c: USEC=0x0 MSEC=0x178 SECS=0x21 MINS=0x2a
kfddde[5].failstmp.hi:                0 ; 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[5].failstmp.lo:                0 ; 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[5].timer:                      0 ; 0x948: 0x00000000
kfddde[5].size:                 1258291 ; 0x94c: 0x00133333
kfddde[5].srRloc.super.hiStart:       0 ; 0x950: 0x00000000
kfddde[5].srRloc.super.loStart:       0 ; 0x954: 0x00000000
kfddde[5].srRloc.super.length:        0 ; 0x958: 0x00000000
kfddde[5].srRloc.incarn:              0 ; 0x95c: 0x00000000
kfddde[5].dskrprtm:                   0 ; 0x960: 0x00000000
kfddde[5].start0:                     0 ; 0x964: 0x00000000
kfddde[5].size0:                1258291 ; 0x968: 0x00133333
kfddde[5].used0:                1258255 ; 0x96c: 0x0013330f
kfddde[5].slot:                       0 ; 0x970: 0x00000000


kfddde[6].entry.incarn:               1 ; 0xaa4: A=1 NUMM=0x0
kfddde[6].entry.hash:                 6 ; 0xaa8: 0x00000006
kfddde[6].entry.refer.number:4294967295 ; 0xaac: 0xffffffff
kfddde[6].entry.refer.incarn:         0 ; 0xab0: A=0 NUMM=0x0
kfddde[6].dsknum:                     6 ; 0xab4: 0x0006
kfddde[6].state:                      2 ; 0xab6: KFDSTA_NORMAL
kfddde[6].ddchgfl:                    0 ; 0xab7: 0x00
kfddde[6].dskname:            DATA_0006 ; 0xab8: length=9
kfddde[6].fgname:             DATA_0006 ; 0xad8: length=9
kfddde[6].crestmp.hi:          33010550 ; 0xaf8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[6].crestmp.lo:        2875645952 ; 0xafc: USEC=0x0 MSEC=0x1b8 SECS=0x36 MINS=0x2a
kfddde[6].failstmp.hi:                0 ; 0xb00: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[6].failstmp.lo:                0 ; 0xb04: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[6].timer:                      0 ; 0xb08: 0x00000000
kfddde[6].size:                 1258291 ; 0xb0c: 0x00133333
kfddde[6].srRloc.super.hiStart:       0 ; 0xb10: 0x00000000
kfddde[6].srRloc.super.loStart:       0 ; 0xb14: 0x00000000
kfddde[6].srRloc.super.length:        0 ; 0xb18: 0x00000000
kfddde[6].srRloc.incarn:              0 ; 0xb1c: 0x00000000
kfddde[6].dskrprtm:                   0 ; 0xb20: 0x00000000
kfddde[6].start0:                     0 ; 0xb24: 0x00000000
kfddde[6].size0:                1258291 ; 0xb28: 0x00133333
kfddde[6].used0:                1258247 ; 0xb2c: 0x00133307
kfddde[6].slot:                       0 ; 0xb30: 0x00000000

kfddde[7].entry.incarn:               1 ; 0xc64: A=1 NUMM=0x0
kfddde[7].entry.hash:                 7 ; 0xc68: 0x00000007
kfddde[7].entry.refer.number:4294967295 ; 0xc6c: 0xffffffff
kfddde[7].entry.refer.incarn:         0 ; 0xc70: A=0 NUMM=0x0
kfddde[7].dsknum:                     7 ; 0xc74: 0x0007
kfddde[7].state:                      2 ; 0xc76: KFDSTA_NORMAL
kfddde[7].ddchgfl:                    0 ; 0xc77: 0x00
kfddde[7].dskname:            DATA_0007 ; 0xc78: length=9
kfddde[7].fgname:             DATA_0007 ; 0xc98: length=9
kfddde[7].crestmp.hi:          33010550 ; 0xcb8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[7].crestmp.lo:        2898849792 ; 0xcbc: USEC=0x0 MSEC=0x23c SECS=0xc MINS=0x2b
kfddde[7].failstmp.hi:                0 ; 0xcc0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[7].failstmp.lo:                0 ; 0xcc4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[7].timer:                      0 ; 0xcc8: 0x00000000
kfddde[7].size:                 1258291 ; 0xccc: 0x00133333
kfddde[7].srRloc.super.hiStart:       0 ; 0xcd0: 0x00000000
kfddde[7].srRloc.super.loStart:       0 ; 0xcd4: 0x00000000
kfddde[7].srRloc.super.length:        0 ; 0xcd8: 0x00000000
kfddde[7].srRloc.incarn:              0 ; 0xcdc: 0x00000000
kfddde[7].dskrprtm:                   0 ; 0xce0: 0x00000000
kfddde[7].start0:                     0 ; 0xce4: 0x00000000
kfddde[7].size0:                1258291 ; 0xce8: 0x00133333
kfddde[7].used0:                1258209 ; 0xcec: 0x001332e1
kfddde[7].slot:                       0 ; 0xcf0: 0x00000000

结合上述信息构造类似磁盘头文件

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3123334821 ; 0x00c: 0xba2a4ea5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:     ORCLDISKVOL2 ; 0x000: length=12
kfdhdb.driver.reserved[0]:    827084630 ; 0x008: 0x314c4f56
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        2 ; 0x024: 0x0002
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfdhdb.crestmp.lo:           2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfdhdb.mntstmp.hi:             33049840 ; 0x0b0: HOUR=0x10 DAYS=0x7 MNTH=0x3 YEAR=0x7e1
kfdhdb.mntstmp.lo:           1588567040 ; 0x0b4: USEC=0x0 MSEC=0x3e7 SECS=0x2a MINS=0x17
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                 1258291 ; 0x40c: 0x00133333
kfdhdb.pmcnt:                        19 ; 0x0c8: 0x00000013
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

然后通过kfed merge分别对所有磁盘头进行重构,然后通过dul去识别拷贝文件

+DATA/XFF/spfileXFF.ora.265.869858859
+DATA/XFF/CONTROLFILE/Current.256.796701475
+DATA/XFF/ONLINELOG/group_1.257.796701475
+DATA/XFF/ONLINELOG/group_2.258.796701485
+DATA/XFF/ONLINELOG/group_3.266.796704261
+DATA/XFF/ONLINELOG/group_4.267.796704277
+DATA/XFF/ONLINELOG/group_5.1235.824131117
+DATA/XFF/ONLINELOG/group_6.1115.824200515
+DATA/XFF/ONLINELOG/group_7.1113.824200587
+DATA/XFF/ONLINELOG/group_8.1112.824200627
+DATA/XFF/ONLINELOG/group_9.1066.824201189
+DATA/XFF/ONLINELOG/group_10.1063.824201207
+DATA/XFF/ONLINELOG/group_11.1062.824201287
+DATA/XFF/ONLINELOG/group_12.1061.824201301


File 259 datafile size 2147491840, block size 8192
File 260 datafile size 32186048512, block size 8192
File 261 datafile size 6897541120, block size 8192
File 263 datafile size 27409784832, block size 8192
File 264 datafile size 34359730176, block size 8192
File 280 datafile size 31457288192, block size 8192
File 281 datafile size 31457288192, block size 8192
File 330 datafile size 5242888192, block size 8192
File 334 datafile size 20971528192, block size 8192
File 382 datafile size 20971528192, block size 8192
File 383 datafile size 20971528192, block size 8192
File 384 datafile size 31457288192, block size 8192
File 385 datafile size 31457288192, block size 8192
File 386 datafile size 31457288192, block size 8192
File 387 datafile size 4294975488, block size 8192
File 388 datafile size 4294975488, block size 8192
File 389 datafile size 4294975488, block size 8192
File 390 datafile size 4294975488, block size 8192
File 391 datafile size 4294975488, block size 8192
File 392 datafile size 4294975488, block size 8192
File 394 datafile size 31457288192, block size 8192
File 491 datafile size 20971528192, block size 8192
File 494 datafile size 20971528192, block size 8192
File 578 datafile size 31457288192, block size 8192
File 597 datafile size 20971528192, block size 8192
File 613 datafile size 4294975488, block size 8192
File 638 datafile size 31457288192, block size 8192
File 668 datafile size 16988184576, block size 8192
File 688 datafile size 20971528192, block size 8192
File 740 datafile size 31457288192, block size 8192
File 787 datafile size 31457288192, block size 8192
File 798 datafile size 31457288192, block size 8192
File 806 datafile size 31457288192, block size 8192
File 810 datafile size 31457288192, block size 8192
File 845 datafile size 31457288192, block size 8192
File 886 datafile size 31457288192, block size 8192
File 887 datafile size 31457288192, block size 8192
File 889 datafile size 31457288192, block size 8192
File 892 datafile size 31457288192, block size 8192
File 903 datafile size 31457288192, block size 8192
File 932 datafile size 31457288192, block size 8192
File 933 datafile size 3145736192, block size 8192
File 951 datafile size 20971528192, block size 8192
File 953 datafile size 31457288192, block size 8192
File 955 datafile size 31457288192, block size 8192
File 963 datafile size 31457288192, block size 8192
File 1000 datafile size 31457288192, block size 8192
File 1001 datafile size 12035563520, block size 8192
File 1031 datafile size 31457288192, block size 8192
File 1033 datafile size 31457288192, block size 8192
File 1035 datafile size 31457288192, block size 8192
File 1037 datafile size 31457288192, block size 8192
File 1045 datafile size 31457288192, block size 8192
File 1073 datafile size 4294975488, block size 8192
File 1074 datafile size 4294975488, block size 8192
File 1075 datafile size 4294975488, block size 8192
File 1076 datafile size 8589942784, block size 8192
File 1077 datafile size 31457288192, block size 8192
File 1078 datafile size 8589942784, block size 8192
File 1079 datafile size 8589942784, block size 8192
File 1080 datafile size 4294975488, block size 8192
File 1081 datafile size 8589942784, block size 8192
File 1082 datafile size 8589942784, block size 8192
File 1083 datafile size 8589942784, block size 8192
File 1084 datafile size 8589942784, block size 8192
File 1085 datafile size 32365355008, block size 8192
File 1086 datafile size 9071239168, block size 8192
File 1116 datafile size 8589942784, block size 8192
File 1133 datafile size 8589942784, block size 8192
File 1219 datafile size 31457288192, block size 8192
File 1245 datafile size 31457288192, block size 8192
File 1249 datafile size 31457288192, block size 8192
File 1251 datafile size 31457288192, block size 8192
File 1322 datafile size 4294975488, block size 8192
File 1442 datafile size 31457288192, block size 8192
File 1468 datafile size 1048584192, block size 8192
File 1508 datafile size 31457288192, block size 8192
File 1554 datafile size 4294975488, block size 8192
File 1570 datafile size 31457288192, block size 8192
File 2004 datafile size 31457288192, block size 8192
File 2005 datafile size 31457288192, block size 8192
File 2344 datafile size 31457288192, block size 8192
File 2345 datafile size 31457288192, block size 8192
File 2348 datafile size 31457288192, block size 8192
File 2617 datafile size 10737426432, block size 8192
File 2618 datafile size 21474844672, block size 8192
File 2766 datafile size 33554440192, block size 8192
File 2782 datafile size 31457288192, block size 8192
File 2784 datafile size 31457288192, block size 8192
File 2893 datafile size 31457288192, block size 8192
File 2924 datafile size 31457288192, block size 8192
File 2925 datafile size 31457288192, block size 8192
File 2926 datafile size 31457288192, block size 8192
File 2983 datafile size 31457288192, block size 8192
File 2984 datafile size 31457288192, block size 8192
File 3634 datafile size 31457288192, block size 8192
File 3909 datafile size 31457288192, block size 8192
File 3917 datafile size 31457288192, block size 8192
File 3920 datafile size 31457288192, block size 8192
File 3922 datafile size 31457288192, block size 8192

剩下的事情就比较简单了,通过把spfile,controlfile,datafile文件拷贝出来,本地启动数据库,恢复成功
asm_open_db


open_alert

如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 Oracle ASM, 非常规恢复 | 标签为 , , , , | 评论关闭

分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

有客户反馈他们重启系统之后,发现asmlib创建的asmdisk丢失了,然后又使用oracleasm deletedisk和createdisk重新创建的asm disk,最后发现asm diskgroup无法mount。让客户通过dd 备份5m数据,然后使用kfed分析
kefd分析结果

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img

kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                  3760689243 ; 0x00c: 0xe027905b
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=10
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=255
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=256|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                     256 ; 0x004: T=0 NUMB=0x100
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3925268785 ; 0x00c: 0xe9f6d931
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:32994098 ; 0x000: HOUR=0x12 DAYS=0x19 MNTH=0x
c YEAR=0x7dd
kfdpHdrPairBv1.first.super.time.lo:1614030848 ; 0x004: USEC=0x0 MSEC=0x10a SECS=
0x3 MINS=0x18
kfdpHdrPairBv1.first.super.last:      2 ; 0x008: 0x00000002
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt:   1 ; 0x010: 0x01
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000
……

因为kfed默认每个block为4k,这里提示256是ok的,255是损坏的,从而推测出来,很可能oracleasm createdisk损坏了1M的数据。由于默认au是1m,而且数据库版本是11.2.0.3,而且第256个blkn开始没有损坏,因此初步判断可以考虑使用备份asm disk header来恢复磁盘头
检查还原磁盘头的asm disk

[grid@xifenfei1 disks]$ kfed read DATA1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2776451033 ; 0x00c: 0xa57d47d9
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKDATA1 ; 0x000: length=13
kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]:           49 ; 0x00c: 0x00000031
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32994099 ; 0x0a8: HOUR=0x13 DAYS=0x19 MNTH=0xc YEAR=0x7dd
kfdhdb.crestmp.lo:           2797442048 ; 0x0ac: USEC=0x0 MSEC=0x365 SECS=0x2b MINS=0x29
kfdhdb.mntstmp.hi:             33022061 ; 0x0b0: HOUR=0xd DAYS=0x3 MNTH=0x8 YEAR=0x7df
kfdhdb.mntstmp.lo:            816879616 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0xb MINS=0xc
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
…………

证明磁盘头确实被比较完美的修复了,现在的任务是尝试mount磁盘组
mount磁盘组

[grid@xifenfei1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 6 20:54:53 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

asm diskgroup已经正常mount,使用asmcmd命令检查文件是否正常
分析磁盘组数据是否正常

[grid@xifenfei1 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   1622060   636493                0          636493              0             N  DATA/
ASMCMD> cd data
ASMCMD> ls
ORCL/
ASMCMD> cd orcl
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileorcl.ora
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI20130801.314.835191517
XIFENFEI20140101.321.835191571
XIFENFEI20140201.322.835191573
XIFENFEI20140301.323.835191573
…………
SYSAUX.270.835182535
SYSAUX.838.874669369
SYSTEM.271.835182533
SYSTEM.823.873555791
SYSTEM.945.883146947
…………

这里看到磁盘组里面的数据文件都正常,使用同样的方法,继续mount其他磁盘组。
尝试启动数据库

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'

查看数据库alert日志

ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Aug 06 21:04:06 2015
Thread 2 advanced to log sequence 21146 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Thread 2 advanced to log sequence 21147 (before internal thread enable)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-19816: 警告: 文件可能存在于数据库未知的 db_recovery_file_dest 中。
ORA-17502: ksfdcre: 4 未能创建文件 +ARCH
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [1] [47962] [1344818371 != 630731762]
ORA-15130: diskgroup "ARCH" is being dismounted
ORA-15066: offlining disk "ARCH_0000" in group "ARCH" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ARCH: Error 19504 Creating archive log file to '+ARCH'
NOTE: Deferred communication with ASM instance
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-15130: diskgroup "ARCH" is being dismounted
NOTE: deferred map free for map id 754
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'
ORA-16038 signalled during: ALTER DATABASE OPEN...
Thu Aug 06 21:04:10 2015
SUCCESS: diskgroup ARCH was dismounted
SUCCESS: diskgroup ARCH was dismounted
Thu Aug 06 21:04:10 2015
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Aug 06 21:04:10 2015
System state dump requested by (instance=1, osid=27353 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_27318.trc
CKPT (ospid: 27353): terminating the instance due to error 221
Instance terminated by CKPT, pid = 27353

查看asm alert日志

Thu Aug 06 21:04:07 2015
WARNING: cache read  a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group ARCH was dumped to /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
WARNING: cache read (retry) a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ERROR: cache failed to read group=2(ARCH) dsk=0 blk=1 from disk(s): 0(ARCH_0000)
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: cache initiating offline of disk 0 group ARCH
NOTE: process _user27462_+asm1 (27462) initiating offline of disk 0.3942486752 (ARCH_0000) with mask 0x7e in group 2
WARNING: Disk 0 (ARCH_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 2, dsk = 0/0xeafd92e0, mask = 0x6a, op = clear
Thu Aug 06 21:04:07 2015
GMON updating disk modes for group 2 at 17 for pid 35, osid 27462
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Aug 06 21:04:07 2015
NOTE: cache dismounting (not clean) group 2/0x723D6245 (ARCH)
NOTE: messaging CKPT to quiesce pins Unix process pid: 27089, image: oracle@xifenfei1 (B000)
WARNING: Offline of disk 0 (ARCH_0000) in group 2 and mode 0x7f failed on ASM inst 1
Thu Aug 06 21:04:07 2015
NOTE: halting all I/Os to diskgroup 2 (ARCH)
System State dumped to trace file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
NOTE: AMDU dump of disk group ARCH created at /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace
Thu Aug 06 21:04:09 2015
NOTE: LGWR doing non-clean dismount of group 2 (ARCH)
NOTE: LGWR sync ABA=126.806 last written ABA 126.806

这里可以看出来,报错的block为arch磁盘组的第一个磁盘的第一个au的第二个block,而我们在开始的时候,已经分析了asm disk的第一个au完全损坏,并且我们使用了备份磁盘头进行来还原,勉强可以让磁盘组mount起来,但是由于数据库在启动的时候,需要对redo进行归档,而归档的过程需要写到arch磁盘组里面,这个时候需要访问到au=0 blk=1,而这个块本身是坏的,因此这个时候该块盘的disk就被offline掉了,而这个磁盘组是外部冗余的,因此磁盘组dismount了,所以数据库无法启动.

分析第一个au里面到底有哪些东西

SQL> select DISK_NUMBER,path from v$asm_disk;

DISK_NUMBER PATH
----------- --------------------------------------------------
          0 /dev/raw/raw1
          2 /dev/raw/raw3
          1 /dev/raw/raw2

[oracle@xifenfei raw]$ kfed read raw1 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw1 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw2 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw3 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过一个测试机器的一个磁盘组进行分析,我们可以基本上确定asm 第一个au除了asm disk header的KFBTYP_DISKHEAD之外,其他主要是KFBTYP_FREESPC(Free Space Table)和KFBTYP_ALLOCTBL(allocator table),主要就是记录asm中au的分配情况,也就是进一步说明,如果我不对asm里面的数据使用更多的au分配或者回收au,在缺少第一个au的1-255个block的信息情况下,asm的磁盘组也不会dismount。根据这个思路,让数据库归档到本地,然后继续测试

继续open数据库

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Aug 07 02:43:13 2015
ARC1 started with pid=34, OS id=22778 
Fri Aug 07 02:43:13 2015
ARC2 started with pid=35, OS id=22780 
Fri Aug 07 02:43:13 2015
ARC3 started with pid=36, OS id=22782 
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Fri Aug 07 02:43:24 2015
Thread 1 opened at log sequence 18604
  Current log# 10 seq# 18604 mem# 0: /tmp/xifenfei/otherfile/group_10.273.835182533
  Current log# 10 seq# 18604 mem# 1: /tmp/xifenfei/otherfile/group_10.263.835182533
Successful open of redo thread 1
Fri Aug 07 02:43:24 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Aug 07 02:43:25 2015
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Fri Aug 07 02:43:26 2015
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:21328 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Aug 07 02:43:26 2015
Redo thread 2 internally disabled at seq 21147 (CKPT)
[21341] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:96999124 end:97000624 diff:1500 (15 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Aug 07 02:43:31 2015
GTX0 started with pid=37, OS id=22803 
Starting background process RCBG
Fri Aug 07 02:43:31 2015
RCBG started with pid=38, OS id=22805 
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Aug 07 02:43:34 2015
Archived Log entry 73876 added for thread 2 sequence 21145 ID 0x513c613f dest 1: <---果然有归档操作发生
Starting background process QMNC
Fri Aug 07 02:43:34 2015
QMNC started with pid=39, OS id=22812 
Fri Aug 07 02:43:35 2015
Archived Log entry 73877 added for thread 2 sequence 21146 ID 0x513c613f dest 1:
Fri Aug 07 02:43:35 2015
ARC0: Archiving disabled thread 2 sequence 21147
Archived Log entry 73878 added for thread 2 sequence 21147 ID 0x513c613f dest 1:
Fri Aug 07 02:43:37 2015
Completed: alter database open

现在到了这一步,基本上可以确定,数据库是零丢失恢复。由于asm 第一个au丢失数据严重,想要彻底修复比较难,考虑把数据库启动到mount/read only状态然后使用rman备份数据,然后进行重建asm 磁盘组,再迁移回来。至此完美恢复asmlib的磁盘被oracleasm重写的故障恢复,实现数据0丢失.当然在整个恢复过程没有于此的简单,涉及到在votedisk损坏的情况下,如何mount磁盘组,vote diskgroup的损坏修复问题,磁盘组在10g/11.1和11.2还原磁盘头备份的问题等问题.
虽然本次的恢复案例中,由于asmlib的asm disk不可见就轻易使用oracleasm createdisk命令对磁盘进行了重建,犯了一个很大错误,但是在重建之后,发现磁盘组依旧异常,未继续操作(比如重建磁盘组等),为最后的数据完全恢复创造了必要条件,使得客户的没有任何数据损失。如果再对除磁盘组继续复写操作,可能会导致数据永久性丢失。这个教训告诉我们:遇到自己不能把握的事情,及时终止,不要让错误越走越远

发表在 Oracle | 标签为 , , , , , | 4 条评论