标签归档:ORA-15042

fdisk分区导致asm disk破坏数据库恢复

尝试mount data磁盘组

SQL> alter diskgroup DATADG mount 
NOTE: cache registered group DATADG number=1 incarn=0xbc43fafd
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xbc43fafd
NOTE: Assigning number (1,0) to disk (/dev/raw/raw2)
Thu Jun 02 10:14:33 2022
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 27 for pid 27, osid 3853
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 28 for pid 27, osid 3853
NOTE: cache dismounting (clean) group 1/0xBC43FAFD (DATADG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3853, image: oracle@node1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xBC43FAFD (DATADG) 
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xbc43fafd
NOTE: cache deleting context for group DATADG 1/0xbc43fafd
GMON dismounting group 1 at 29 for pid 27, osid 3853
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1" 
ERROR: alter diskgroup DATADG mount
Thu Jun 02 10:14:33 2022
ASM Health Checker found 1 new failures

报错信息比较明显 datadg的disk number 为1的磁盘丢失了。通过fdisk确认磁盘情况

Disk /dev/sdb: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006c2be

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sda: 53.7 GB, 53687091200 bytes
64 heads, 32 sectors/track, 51200 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00061443

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           2        2049     2097152   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2            2050       10241     8388608   82  Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3           10242       12289     2097152   83  Linux
Partition 3 does not end on cylinder boundary.
/dev/sda4           12290       51200    39844864    5  Extended
Partition 4 does not end on cylinder boundary.
/dev/sda5           12291       14338     2097152   83  Linux
/dev/sda6           14340       50178    36699136   83  Linux
/dev/sda7           50180       51200     1045504   83  Linux

Disk /dev/sdc: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1b3fba6b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        1045     8393931   83  Linux
/dev/sdc2            1046       26108   201318547+  83  Linux

Disk /dev/sdd: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4c63ecad

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       65270   524281243+  83  Linux

Disk /dev/sde: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdf: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

根据客户反馈,异常的应该是一个500G的磁盘,而其中sdb为分区,通过kfed命令分析,确认sdc1为ocr磁盘,sdc2为datadg的一块磁盘,另外一块磁盘应该在sdd,sde,sdf三者之中,通过kfed分析sde,sdf均不可能是asm disk(一块是文件系统,一块是彻底没有使用的空盘),如果datadg的磁盘没有丢失,那应该就是sdd这块磁盘,通过dd 磁盘100M空间,然后通过kfed进行分析确认

E:\TEMP\xff>kfed read sdd.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006648400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0066485B0 00000000 00000000 4C63ECAD 01000000  [..........cL....]
0066485C0 FE830001 003FFFFF CB370000 00003E7F  [......?...7..>..]
0066485D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0066485F0 00000000 00000000 00000000 AA550000  [..............U.]
006648600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006768400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0067685B0 00000000 00000000 70D364B4 FE000000  [.........d.p....]
0067685C0 FE83FFFF D13FFFFF BB7603EB 00003A93  [......?...v..:..]
0067685D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0067685F0 00000000 00000000 00000000 AA550000  [..............U.]
006768600 02038201 00000008 80000001 826037C1  [.............7`.]
006768EA0 00000079 00800105 0000007A 00800105  [y.......z.......]
006768EB0 0000007C 00800105 0000007D 00800105  [|.......}.......]
0067693C0 0000015C 00800105 0000015D 00800105  [\.......].......]
0067693D0 0000015F 00800105 00000160 00800105  [_.......`.......]
0067693E0 00000161 00800105 00000163 00800105  [a.......c.......]
0067693F0 00000164 00800105 00000166 00800105  [d.......f.......]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd.dd blkn=1|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                  2197087544 ; 0x00c: 0x82f4e538
kfbh.fcn.base:                   616391 ; 0x010: 0x000967c7
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                         254 ; 0x006: 0x00fe
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000

通过上述信息分析,基本上可以确认sdd磁盘以前是asm disk,但是被fdisk进行了分区,基于这种情况,通过对磁盘组进行修复

E:\TEMP\xff>kfed read sdd.ok
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                   424926402 ; 0x00c: 0x1953dcc2
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        1 ; 0x024: 0x0001
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0001 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0001 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33074858 ; 0x0a8: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.crestmp.lo:           2375520256 ; 0x0ac: USEC=0x0 MSEC=0x1e4 SECS=0x19 MINS=0x23
kfdhdb.mntstmp.hi:             33074858 ; 0x0b0: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.mntstmp.lo:           2375522304 ; 0x0b4: USEC=0x0 MSEC=0x1e6 SECS=0x19 MINS=0x23
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  512000 ; 0x0c4: 0x0007d000
kfdhdb.pmcnt:                         6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             33072461 ; 0x0e4: HOUR=0xd DAYS=0xa MNTH=0x9 YEAR=0x7e2
kfdhdb.grpstmp.lo:           3452534784 ; 0x0e8: USEC=0x0 MSEC=0x260 SECS=0x1c MINS=0x33

磁盘组mount成功,数据库open成功,实现数据0丢失
20220611171941
20220611172005


使用rman对数据库进行备份,并且重建磁盘组实现数据0丢失

发表在 Oracle备份恢复 | 标签为 , , , , , , | 留下评论

再一起asm disk被格式化成ext3文件系统故障恢复

国庆节前夕接到朋友求救电话asm disk被格式化成ext3格式了,具体操作如下
20191006205129


20191006205203

并且把这个分区直接挂载到/目录
20191006205239

由于/被挂载新格式化的控盘,导致asm磁盘组访问其他盘报错

Sun Sep 29 18:15:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: cache failed reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 5 (DATA_0005) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-15080: synchronous I/O operation to a disk failed
WARNING: cache succeeded reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 7 (DATA_0007) kfkist=0x20 status=0x01 osderr=0x0 file=kfc.c line=11637
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: PST-initiated drop of 1 disk(s) in group 1(.2380027701))

重启系统之后,重试mount 磁盘组

GMON dismounting group 1 at 18 for pid 31, osid 44279
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x1 marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0010 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0011 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0012 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0013 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0014 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "1" 
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:2587:2} */

由于sdb(asm disk 8)被格式化,导致data磁盘组无法正常mount.这个客户运气比较好data 磁盘组是normal模式,但是由于mount到/,导致disk 4被强制drop,因此无法mount成功,但是通过一系列处理数据实现完美恢复,0数据丢失
20191006211707


如果磁盘组是外部冗余,请参考:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

发表在 非常规恢复 | 标签为 , , , | 评论关闭

Oracle Exadata坏盘导致磁盘组无法mount恢复

接到朋友求救有客户oracle exadata一体机 的 asm磁盘组无法mount,希望我们提供恢复支持服务
经过分析和了解,大致问题是:磁盘空间已经超容量使用(部分数据不能完成ASM镜像),最近又损坏一块盘,导致asm 磁盘组无法mount。我们分析后,通过重构exadata celldisk数据,将asm 磁盘组 mount成功后,实现五套数据库全部open成功(由于底层磁盘部分数据损坏,导致部分数据访问报错,需要在oracle层面进行处理)。

本次问题的具体分析和处理如下:
存放数据库文件的磁盘组不能mount

Wed Dec 12 21:29:04 2018
SQL> alter diskgroup DATA_XFF mount force 
NOTE: cache registered group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache began mount (first) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
Wed Dec 12 21:29:10 2018
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 101 for pid 27, osid 62541
NOTE: Assigning number (1,0) to disk ()
GMON querying group 1 at 102 for pid 27, osid 62541
NOTE: process _user62541_+asm2 (62541) initiating offline of disk 0.3915937355 () with mask 0x7e[0x7f] in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968764b, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 103 for pid 27, osid 62541
ERROR: Disk 0 cannot be offlined, since all the disks [0, 25] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline of disk 0 () in group 1 and mode 0x7f failed on ASM inst 2
NOTE: cache dismounting (not clean) group 1/0x5FE882CB (DATA_XFF) 
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x5FE882CB (DATA_XFF) 
NOTE: cache ending mount (fail) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache deleting context for group DATA_XFF 1/0x5fe882cb
GMON dismounting group 1 at 104 for pid 27, osid 62541
ERROR: diskgroup DATA_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "0" in group "DATA_XFF" may result in a data loss
ORA-15042: ASM disk "0" is missing from group number "1" 
ERROR: alter diskgroup DATA_XFF mount force

检查底层损坏情况

CellCLI>   list physicaldisk
         20:0            KN3VZL          normal
         20:1            KNAWLL          normal
         20:2            KN4E4L          warning - predictive failure, poor performance
         20:3            KNAN5L          normal
         20:4            KMJKYL          normal
         20:5            KN5DGL          normal
         20:6            KMDLWL          normal
         20:7            KMDKPL          normal
         20:8            KMDA7L          normal
         20:9            KN1YJL          normal
         20:10           KMH1YL          normal
         20:11           KMVHAL          normal

CellCLI>   list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       proactive failure
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

在db节点无法发现异常磁盘的asm disk

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

根据客户的反馈该磁盘组几乎全部被使用,asmcmd lsdg看到Usable_file_MB已经出现负值.证明该磁盘组本身的normal没有完全存储两份数据,在这样的情况下,继续坏盘会导致部分数据只有一份,因此也就出现了这里的磁盘组无法正常mount成功.

通过底层修复celldisk之后

CellCLI>  list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       active
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  12:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

data磁盘组直接mount成功

Fri Dec 14 14:04:59 2018
SQL> alter diskgroup DATA_XFF mount 
NOTE: cache registered group DATA_XFF number=1 incarn=0x78a886e7
NOTE: cache began mount (not first) of group DATA_XFF number=1 incarn=0x78a886e7
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
NOTE: Assigning number (1,0) to disk (o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01)
Fri Dec 14 14:04:59 2018
GMON querying group 1 at 78 for pid 28, osid 76016
NOTE: Assigning number (1,24) to disk ()
NOTE: Assigning number (1,25) to disk ()
NOTE: Assigning number (1,26) to disk ()
NOTE: Assigning number (1,27) to disk ()
NOTE: Assigning number (1,28) to disk ()
NOTE: Assigning number (1,29) to disk ()
NOTE: Assigning number (1,30) to disk ()
NOTE: Assigning number (1,31) to disk ()
NOTE: Assigning number (1,32) to disk ()
NOTE: Assigning number (1,33) to disk ()
NOTE: Assigning number (1,35) to disk ()
GMON querying group 1 at 79 for pid 28, osid 76016
NOTE: cache opening disk 0 of grp 1: DATA_XFF_CD_02_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01
NOTE: cache opening disk 1 of grp 1: DATA_XFF_CD_05_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01
NOTE: cache opening disk 2 of grp 1: DATA_XFF_CD_03_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01
NOTE: F1X0 found on disk 2 au 5 fcn 0.15948262
NOTE: cache opening disk 3 of grp 1: DATA_XFF_CD_06_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01
NOTE: cache opening disk 4 of grp 1: DATA_XFF_CD_09_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01
NOTE: cache opening disk 5 of grp 1: DATA_XFF_CD_04_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01
NOTE: cache opening disk 6 of grp 1: DATA_XFF_CD_07_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01
NOTE: cache opening disk 7 of grp 1: DATA_XFF_CD_11_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01
NOTE: cache opening disk 8 of grp 1: DATA_XFF_CD_01_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01
NOTE: cache opening disk 9 of grp 1: DATA_XFF_CD_00_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01
NOTE: cache opening disk 10 of grp 1: DATA_XFF_CD_10_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01
NOTE: cache opening disk 11 of grp 1: DATA_XFF_CD_08_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01
NOTE: cache opening disk 12 of grp 1: DATA_XFF_CD_00_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02
NOTE: cache opening disk 13 of grp 1: DATA_XFF_CD_01_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02
NOTE: cache opening disk 14 of grp 1: DATA_XFF_CD_02_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02
NOTE: cache opening disk 15 of grp 1: DATA_XFF_CD_03_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02
NOTE: cache opening disk 16 of grp 1: DATA_XFF_CD_04_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02
NOTE: cache opening disk 17 of grp 1: DATA_XFF_CD_05_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02
NOTE: cache opening disk 18 of grp 1: DATA_XFF_CD_06_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02
NOTE: cache opening disk 19 of grp 1: DATA_XFF_CD_07_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02
NOTE: cache opening disk 20 of grp 1: DATA_XFF_CD_08_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02
NOTE: cache opening disk 21 of grp 1: DATA_XFF_CD_09_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02
NOTE: F1X0 found on disk 21 au 2 fcn 0.15948262
NOTE: cache opening disk 22 of grp 1: DATA_XFF_CD_10_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02
NOTE: cache opening disk 23 of grp 1: DATA_XFF_CD_11_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02
NOTE: cache opening disk 36 of grp 1: DATA_XFF_CD_11_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03
NOTE: cache opening disk 37 of grp 1: DATA_XFF_CD_04_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03
NOTE: cache opening disk 38 of grp 1: DATA_XFF_CD_00_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03
NOTE: cache opening disk 39 of grp 1: DATA_XFF_CD_03_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03
NOTE: cache opening disk 40 of grp 1: DATA_XFF_CD_05_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03
NOTE: cache opening disk 41 of grp 1: DATA_XFF_CD_08_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03
NOTE: cache opening disk 42 of grp 1: DATA_XFF_CD_01_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03
NOTE: cache opening disk 43 of grp 1: DATA_XFF_CD_09_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03
NOTE: cache opening disk 44 of grp 1: DATA_XFF_CD_06_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03
NOTE: F1X0 found on disk 44 au 2 fcn 0.15948262
NOTE: cache opening disk 45 of grp 1: DATA_XFF_CD_07_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03
NOTE: cache opening disk 46 of grp 1: DATA_XFF_CD_02_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03
NOTE: cache mounting (not first) normal redundancy group 1/0x78A886E7 (DATA_XFF)
Fri Dec 14 14:04:59 2018
kjbdomatt send to inst 2
Fri Dec 14 14:04:59 2018
NOTE: attached to recovery domain 1
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Fri Dec 14 14:04:59 2018
NOTE: LGWR attempting to mount thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR found thread 2 closed at ABA 98.4672
NOTE: LGWR mounted thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR opening thread 2 at fcn 0.18931129 ABA 99.4673
NOTE: cache mounting group 1/0x78A886E7 (DATA_XFF) succeeded
NOTE: cache ending mount (success) of group DATA_XFF number=1 incarn=0x78a886e7
GMON querying group 1 at 80 for pid 19, osid 9805
Fri Dec 14 14:04:59 2018
NOTE: Instance updated compatible.asm to 11.2.0.3.0 for grp 1
SUCCESS: diskgroup DATA_XFF was mounted
SUCCESS: alter diskgroup DATA_XFF mount

恢复后的asm磁盘状态

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  4194304  15160320  4776184          5197824         -210820             12             N  DATA_XFF/
MOUNTED  NORMAL  N         512   4096  4194304    864896   863400           298240          282580              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3787840  2157232          1298688          429272              0             N  RECO_XFF/

后续数据库open成功,有部分坏块通过技术手段进行二次处理,至此数据库恢复完成,成功抢救了客户Oracle Exadata中的绝大部分数据.如果有类似xd故障恢复,无法自行解决,需要恢复支持请联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 非常规恢复 | 标签为 , , , , , , , , | 评论关闭