标签归档:ORA-15042

再一起asm disk被格式化成ext3文件系统故障恢复

国庆节前夕接到朋友求救电话asm disk被格式化成ext3格式了,具体操作如下
20191006205129


20191006205203

并且把这个分区直接挂载到/目录
20191006205239

由于/被挂载新格式化的控盘,导致asm磁盘组访问其他盘报错

Sun Sep 29 18:15:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: cache failed reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 5 (DATA_0005) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-15080: synchronous I/O operation to a disk failed
WARNING: cache succeeded reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 7 (DATA_0007) kfkist=0x20 status=0x01 osderr=0x0 file=kfc.c line=11637
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: PST-initiated drop of 1 disk(s) in group 1(.2380027701))

重启系统之后,重试mount 磁盘组

GMON dismounting group 1 at 18 for pid 31, osid 44279
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x1 marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0010 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0011 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0012 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0013 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0014 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "1" 
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:2587:2} */

由于sdb(asm disk 8)被格式化,导致data磁盘组无法正常mount.这个客户运气比较好data 磁盘组是normal模式,但是由于mount到/,导致disk 4被强制drop,因此无法mount成功,但是通过一系列处理数据实现完美恢复,0数据丢失
20191006211707


如果磁盘组是外部冗余,请参考:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

发表在 非常规恢复 | 标签为 , , , | 评论关闭

Oracle Exadata坏盘导致磁盘组无法mount恢复

接到朋友求救有客户oracle exadata一体机 的 asm磁盘组无法mount,希望我们提供恢复支持服务
经过分析和了解,大致问题是:磁盘空间已经超容量使用(部分数据不能完成ASM镜像),最近又损坏一块盘,导致asm 磁盘组无法mount。我们分析后,通过重构exadata celldisk数据,将asm 磁盘组 mount成功后,实现五套数据库全部open成功(由于底层磁盘部分数据损坏,导致部分数据访问报错,需要在oracle层面进行处理)。

本次问题的具体分析和处理如下:
存放数据库文件的磁盘组不能mount

Wed Dec 12 21:29:04 2018
SQL> alter diskgroup DATA_XFF mount force 
NOTE: cache registered group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache began mount (first) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
Wed Dec 12 21:29:10 2018
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 101 for pid 27, osid 62541
NOTE: Assigning number (1,0) to disk ()
GMON querying group 1 at 102 for pid 27, osid 62541
NOTE: process _user62541_+asm2 (62541) initiating offline of disk 0.3915937355 () with mask 0x7e[0x7f] in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968764b, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 103 for pid 27, osid 62541
ERROR: Disk 0 cannot be offlined, since all the disks [0, 25] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline of disk 0 () in group 1 and mode 0x7f failed on ASM inst 2
NOTE: cache dismounting (not clean) group 1/0x5FE882CB (DATA_XFF) 
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x5FE882CB (DATA_XFF) 
NOTE: cache ending mount (fail) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache deleting context for group DATA_XFF 1/0x5fe882cb
GMON dismounting group 1 at 104 for pid 27, osid 62541
ERROR: diskgroup DATA_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "0" in group "DATA_XFF" may result in a data loss
ORA-15042: ASM disk "0" is missing from group number "1" 
ERROR: alter diskgroup DATA_XFF mount force

检查底层损坏情况

CellCLI>   list physicaldisk
         20:0            KN3VZL          normal
         20:1            KNAWLL          normal
         20:2            KN4E4L          warning - predictive failure, poor performance
         20:3            KNAN5L          normal
         20:4            KMJKYL          normal
         20:5            KN5DGL          normal
         20:6            KMDLWL          normal
         20:7            KMDKPL          normal
         20:8            KMDA7L          normal
         20:9            KN1YJL          normal
         20:10           KMH1YL          normal
         20:11           KMVHAL          normal

CellCLI>   list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       proactive failure
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

在db节点无法发现异常磁盘的asm disk

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

根据客户的反馈该磁盘组几乎全部被使用,asmcmd lsdg看到Usable_file_MB已经出现负值.证明该磁盘组本身的normal没有完全存储两份数据,在这样的情况下,继续坏盘会导致部分数据只有一份,因此也就出现了这里的磁盘组无法正常mount成功.

通过底层修复celldisk之后

CellCLI>  list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       active
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  12:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

data磁盘组直接mount成功

Fri Dec 14 14:04:59 2018
SQL> alter diskgroup DATA_XFF mount 
NOTE: cache registered group DATA_XFF number=1 incarn=0x78a886e7
NOTE: cache began mount (not first) of group DATA_XFF number=1 incarn=0x78a886e7
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
NOTE: Assigning number (1,0) to disk (o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01)
Fri Dec 14 14:04:59 2018
GMON querying group 1 at 78 for pid 28, osid 76016
NOTE: Assigning number (1,24) to disk ()
NOTE: Assigning number (1,25) to disk ()
NOTE: Assigning number (1,26) to disk ()
NOTE: Assigning number (1,27) to disk ()
NOTE: Assigning number (1,28) to disk ()
NOTE: Assigning number (1,29) to disk ()
NOTE: Assigning number (1,30) to disk ()
NOTE: Assigning number (1,31) to disk ()
NOTE: Assigning number (1,32) to disk ()
NOTE: Assigning number (1,33) to disk ()
NOTE: Assigning number (1,35) to disk ()
GMON querying group 1 at 79 for pid 28, osid 76016
NOTE: cache opening disk 0 of grp 1: DATA_XFF_CD_02_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01
NOTE: cache opening disk 1 of grp 1: DATA_XFF_CD_05_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01
NOTE: cache opening disk 2 of grp 1: DATA_XFF_CD_03_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01
NOTE: F1X0 found on disk 2 au 5 fcn 0.15948262
NOTE: cache opening disk 3 of grp 1: DATA_XFF_CD_06_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01
NOTE: cache opening disk 4 of grp 1: DATA_XFF_CD_09_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01
NOTE: cache opening disk 5 of grp 1: DATA_XFF_CD_04_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01
NOTE: cache opening disk 6 of grp 1: DATA_XFF_CD_07_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01
NOTE: cache opening disk 7 of grp 1: DATA_XFF_CD_11_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01
NOTE: cache opening disk 8 of grp 1: DATA_XFF_CD_01_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01
NOTE: cache opening disk 9 of grp 1: DATA_XFF_CD_00_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01
NOTE: cache opening disk 10 of grp 1: DATA_XFF_CD_10_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01
NOTE: cache opening disk 11 of grp 1: DATA_XFF_CD_08_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01
NOTE: cache opening disk 12 of grp 1: DATA_XFF_CD_00_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02
NOTE: cache opening disk 13 of grp 1: DATA_XFF_CD_01_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02
NOTE: cache opening disk 14 of grp 1: DATA_XFF_CD_02_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02
NOTE: cache opening disk 15 of grp 1: DATA_XFF_CD_03_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02
NOTE: cache opening disk 16 of grp 1: DATA_XFF_CD_04_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02
NOTE: cache opening disk 17 of grp 1: DATA_XFF_CD_05_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02
NOTE: cache opening disk 18 of grp 1: DATA_XFF_CD_06_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02
NOTE: cache opening disk 19 of grp 1: DATA_XFF_CD_07_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02
NOTE: cache opening disk 20 of grp 1: DATA_XFF_CD_08_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02
NOTE: cache opening disk 21 of grp 1: DATA_XFF_CD_09_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02
NOTE: F1X0 found on disk 21 au 2 fcn 0.15948262
NOTE: cache opening disk 22 of grp 1: DATA_XFF_CD_10_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02
NOTE: cache opening disk 23 of grp 1: DATA_XFF_CD_11_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02
NOTE: cache opening disk 36 of grp 1: DATA_XFF_CD_11_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03
NOTE: cache opening disk 37 of grp 1: DATA_XFF_CD_04_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03
NOTE: cache opening disk 38 of grp 1: DATA_XFF_CD_00_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03
NOTE: cache opening disk 39 of grp 1: DATA_XFF_CD_03_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03
NOTE: cache opening disk 40 of grp 1: DATA_XFF_CD_05_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03
NOTE: cache opening disk 41 of grp 1: DATA_XFF_CD_08_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03
NOTE: cache opening disk 42 of grp 1: DATA_XFF_CD_01_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03
NOTE: cache opening disk 43 of grp 1: DATA_XFF_CD_09_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03
NOTE: cache opening disk 44 of grp 1: DATA_XFF_CD_06_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03
NOTE: F1X0 found on disk 44 au 2 fcn 0.15948262
NOTE: cache opening disk 45 of grp 1: DATA_XFF_CD_07_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03
NOTE: cache opening disk 46 of grp 1: DATA_XFF_CD_02_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03
NOTE: cache mounting (not first) normal redundancy group 1/0x78A886E7 (DATA_XFF)
Fri Dec 14 14:04:59 2018
kjbdomatt send to inst 2
Fri Dec 14 14:04:59 2018
NOTE: attached to recovery domain 1
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Fri Dec 14 14:04:59 2018
NOTE: LGWR attempting to mount thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR found thread 2 closed at ABA 98.4672
NOTE: LGWR mounted thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR opening thread 2 at fcn 0.18931129 ABA 99.4673
NOTE: cache mounting group 1/0x78A886E7 (DATA_XFF) succeeded
NOTE: cache ending mount (success) of group DATA_XFF number=1 incarn=0x78a886e7
GMON querying group 1 at 80 for pid 19, osid 9805
Fri Dec 14 14:04:59 2018
NOTE: Instance updated compatible.asm to 11.2.0.3.0 for grp 1
SUCCESS: diskgroup DATA_XFF was mounted
SUCCESS: alter diskgroup DATA_XFF mount

恢复后的asm磁盘状态

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  4194304  15160320  4776184          5197824         -210820             12             N  DATA_XFF/
MOUNTED  NORMAL  N         512   4096  4194304    864896   863400           298240          282580              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3787840  2157232          1298688          429272              0             N  RECO_XFF/

后续数据库open成功,有部分坏块通过技术手段进行二次处理,至此数据库恢复完成,成功抢救了客户Oracle Exadata中的绝大部分数据.如果有类似xd故障恢复,无法自行解决,需要恢复支持请联系我们
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 非常规恢复 | 标签为 , , , , , , , , | 评论关闭

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]

在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "2" is missing from group number "2"

asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)

ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]

kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误

[oracle@fcomtaep2 disks]$ kfed read ASMRECO03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
7FC18D899400 00000000 00000000 00000000 00000000 [................]
  Repeat 27 times
7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................]
7FC18D8995D0 00000000 00000000 00000000 00000000 [................]
  Repeat 1 times
7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.]
7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ******
7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................]
7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......]
7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B]
7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........]
7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....]
7FC18D899660 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..]
7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..]
7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......]
7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.]
7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........]
7FC18D899850 00000000 00000000 00000000 00000000 [................]
 Repeat 186 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.

[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2ABD671E9400 00000000 00000000 00000000 00000000 [................]
  Repeat 31 times
2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........]
2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ******
2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK]
2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc]
2ABD671E9640 00000000 00000019 00030000 00000000 [................]
2ABD671E9650 00000000 00000000 00000000 00000000 [................]
2ABD671E9660 00000000 00000000 00001000 00000000 [................]
2ABD671E9670 0002F000 00000000 00000000 00000000 [................]
2ABD671E9680 00000000 00000000 00000000 00000000 [................]
  Repeat 215 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏

[ebernal@dbaasm tars]$ kfed read rhdisk16
kfbh.endian:                         65 ; 0x000: 0x41
kfbh.hard:                           73 ; 0x001: 0x49
kfbh.type:                           88 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         32 ; 0x003: 0x20
kfbh.block.blk:              1111709260 ; 0x004: blk=1111709260
kfbh.block.obj:              1634861056 ; 0x008: file=131072
kfbh.check:                         119 ; 0x00c: 0x00000077
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B6FE2AC1400 20584941 4243564C 61720000 00000077  [AIX LVCB..raw...] <==== **** Here ******
2B6FE2AC1410 00000000 00000000 00000000 00000000  [................]
2B6FE2AC1420 00000000 00000000 30300000 38306430  [..........000d08]
2B6FE2AC1430 30306131 34643030 30303030 31303030  [1a0000d400000001]
2B6FE2AC1440 61006533 766C6D73 7461645F 00003161  [3e.asmlv_data1..]
2B6FE2AC1450 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
2B6FE2AC1480 54000000 4D206575 20207961 31312037  [...Tue May  7 11]
2B6FE2AC1490 3A33343A 32203633 0A333130 00000000  [:43:36 2013.....]
2B6FE2AC14A0 65755400 79614D20 20372020 343A3131  [.Tue May  7 11:4]
2B6FE2AC14B0 34323A38 31303220 00000A33 44000000  [8:24 2013......D]
2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900  [081AD400.ymm.y..]
2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000  [....None........]
2B6FE2AC14E0 00000000 00000000 00000000 00000000  [................]
        Repeat 14 times
2B6FE2AC15D0 00000000 00000000 65310000 61653934  [..........1e49ea]
2B6FE2AC15E0 342E3862 00000000 00000000 00000000  [b8.4............]
2B6FE2AC15F0 00000000 00000000 00000000 00000000  [................]
  Repeat 224 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏

[oracle@dbep2 disks]$ kfed read asm-disk3
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
06000000 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
0602100 51e2b7f6 00ed4e00 00000000 00000001  [...Q.N..........]
0602120 00000000 0000000b 00000100 0000003c [............<...]
0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa]
0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........]
0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ******
0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........]
0602240 00000000 00000000 00000000 09d18000 [................]
  Repeat 254 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的/u01很可能表明该asm disk被文件系统覆盖

对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , , , , , | 评论关闭