标签归档:ORA-15040

asm disk被加入vg恢复

接到客户恢复请求:把oracle asm datagroup中的一个磁盘增加到vg中,现在磁盘组无法mount,数据库无法正常启动.远程登录现场进行分析发现情况如下:
操作系统层面分析
history操作记录
add_asm_disk_to_vg


这里比较明显把一个磁盘做成pv,并且加入到vg中,然后再分配199G给lv_home,系统层面分析lvm信息

--查看pv信息
[root@xff1 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VolGroup
  PV Size               277.98 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              71161
  Free PE               0
  Allocated PE          71161
  PV UUID               F6QO3f-065n-mwTW-Xbq2-Xx2y-c8HD-Tkr7V7
   
  --- Physical volume ---
  PV Name               /dev/sdg    <----新加入的磁盘
  VG Name               VolGroup
  PV Size               200.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              51199
  Free PE               255
  Allocated PE          50944
  PV UUID               i69vUG-nCIK-dtxL-FvpD-2WZd-bvLv-n7lwrb

[root@xff1 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_root
  LV Name                lv_root
  VG Name                VolGroup
  LV UUID                JUNnkN-m4zq-D0gh-h42b-cUM1-Wh1q-ZMtQE4
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:47 +0800
  LV Status              available
  # open                 1
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_home
  LV Name                lv_home
  VG Name                VolGroup
  LV UUID                eZTkLt-cNGX-371i-m8Bd-VdD9-q6Hz-wYDRIJ
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:54 +0800
  LV Status              available
  # open                 1
  LV Size                422.97 GiB      <-----lv大小编程422G,应该是被扩了199G后结果
  Current LE             108281
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_swap
  LV Name                lv_swap
  VG Name                VolGroup
  LV UUID                54P9ok-VpwO-zM68-hvwY-9GBf-89yb-8xQAMn
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:09:23 +0800
  LV Status              available
  # open                 1
  LV Size                4.00 GiB
  Current LE             1024
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1


[root@xff1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       50G  3.9G   43G   9% /
tmpfs                  63G  509M   63G   1% /dev/shm
/dev/sda1             477M   44M  408M  10% /boot
/dev/mapper/VolGroup-lv_home
                      417G  226G  170G  58% /home  <----增加了199g空间,剩余只剩170G,证明增加空间之后最少使用了30G以上  

基于这样的情况,基本上可以确定sdg盘加入VolGroup中并且被分配给 lv_home中,而且还写入了数据(/home空闲空间只剩余170G,lv_home当时扩了199G).
asm层面分析
asm磁盘组无法mount,提示缺少一块磁盘

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */ 
NOTE: cache registered group DATA number=1 incarn=0xa1dbff16
NOTE: cache began mount (first) of group DATA number=1 incarn=0xa1dbff16
NOTE: Assigning number (1,2) to disk (/dev/asmdisk3)
NOTE: Assigning number (1,1) to disk (/dev/asmdisk2)
Sat Apr 25 13:04:58 2020
ERROR: no read quorum in group: required 1, found 0 disks
NOTE: cache dismounting (clean) group 1/0xA1DBFF16 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 81552, image: oracle@rac2db1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xA1DBFF16 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xa1dbff16
NOTE: cache deleting context for group DATA 1/0xa1dbff16
GMON dismounting group 1 at 19 for pid 30, osid 81552
NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x9 marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15040: diskgroup is incomplete
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */

kfed分析磁盘信息
kfed_lvm


报错比较明显asm disk磁盘头被lvm的信息取代(因为asm disk 被加入到vg中),根据前面的分析,该磁盘被写入数据很可能超过30G,使用kfed分析一个随意au,确认被破坏,证明开始判断基本正确

root@xff1:/home/oracle11g$kfed read /dev/asmdisk1 aun=10000
kfbh.endian:                         51 ; 0x000: 0x33
kfbh.hard:                           55 ; 0x001: 0x37
kfbh.type:                           32 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         42 ; 0x003: 0x2a
kfbh.block.blk:              1329801248 ; 0x004: blk=1329801248
kfbh.block.obj:              1128615502 ; 0x008: file=347726
kfbh.check:                  1094999892 ; 0x00c: 0x41445f54
kfbh.fcn.base:                675103060 ; 0x010: 0x283d4154
kfbh.fcn.wrap:               1448232275 ; 0x014: 0x56524553
kfbh.spare1:                 1598374729 ; 0x018: 0x5f454349
kfbh.spare2:                 1162690894 ; 0x01c: 0x454d414e
7F7843EAD400 2A203733 4F432820 43454E4E 41445F54  [37 * (CONNECT_DA]
7F7843EAD410 283D4154 56524553 5F454349 454D414E  [TA=(SERVICE_NAME]
7F7843EAD420 6361723D 29626432 44494328 5250283D  [=rac2db)(CID=(PR]
7F7843EAD430 4152474F 3A443D4D 4341505C DFCF3153  [OGRAM=D:\PACS1..]
7F7843EAD440 B3BEB7BB 6369445C 65536D6F 72657672  [....\DicomServer]
7F7843EAD450 445C524D 6D6F6369 76726553 524D7265  [MR\DicomServerMR]
7F7843EAD460 6578652E 4F482829 573D5453 362D4E49  [.exe)(HOST=WIN-6]
7F7843EAD470 51414C38 54553645 28294A30 52455355  [8LAQE6UT0J)(USER]
7F7843EAD480 6D64413D 73696E69 74617274 2929726F  [=Administrator))]
7F7843EAD490 202A2029 44444128 53534552 5250283D  [) * (ADDRESS=(PR]
7F7843EAD4A0 434F544F 743D4C4F 28297063 54534F48  [OTOCOL=tcp)(HOST]
7F7843EAD4B0 2E30313D 2E303831 30332E31 4F502829  [=10.180.1.30)(PO]
7F7843EAD4C0 343D5452 37333539 2A202929 74736520  [RT=49537)) * est]
7F7843EAD4D0 696C6261 2A206873 63617220 20626432  [ablish * rac2db ]
7F7843EAD4E0 3231202A 0A343135 2D534E54 31353231  [* 12514.TNS-1251]
7F7843EAD4F0 54203A34 6C3A534E 65747369 2072656E  [4: TNS:listener ]
7F7843EAD500 73656F64 746F6E20 72756320 746E6572  [does not current]
7F7843EAD510 6B20796C 20776F6E 7320666F 69767265  [ly know of servi]
7F7843EAD520 72206563 65757165 64657473 206E6920  [ce requested in ]
7F7843EAD530 6E6E6F63 20746365 63736564 74706972  [connect descript]
………………
7F7843EAE300 6F636944 7265536D 4D726576 69445C52  [DicomServerMR\Di]
7F7843EAE310 536D6F63 65767265 2E524D72 29657865  [comServerMR.exe)]
7F7843EAE320 534F4828 49573D54 4F302D4E 314B304A  [(HOST=WIN-0OJ0K1]
7F7843EAE330 4955304E 55282954 3D524553 696D6441  [N0UIT)(USER=Admi]
7F7843EAE340 7473696E 6F746172 29292972 28202A20  [nistrator))) * (]
7F7843EAE350 52444441 3D535345 4F525028 4F434F54  [ADDRESS=(PROTOCO]
7F7843EAE360 63743D4C 48282970 3D54534F 312E3031  [L=tcp)(HOST=10.1]
7F7843EAE370 312E3038 2930332E 524F5028 35353D54  [80.1.30)(PORT=55]
7F7843EAE380 29383632 202A2029 61747365 73696C62  [268)) * establis]
7F7843EAE390 202A2068 32636172 2A206264 35323120  [h * rac2db * 125]
7F7843EAE3A0 540A3431 312D534E 34313532 4E54203A  [14.TNS-12514: TN]
7F7843EAE3B0 696C3A53 6E657473 64207265 2073656F  [S:listener does ]
7F7843EAE3C0 20746F6E 72727563 6C746E65 6E6B2079  [not currently kn]
7F7843EAE3D0 6F20776F 65732066 63697672 65722065  [ow of service re]
7F7843EAE3E0 73657571 20646574 63206E69 656E6E6F  [quested in conne]
7F7843EAE3F0 64207463 72637365 6F747069 34320A72  [ct descriptor.24]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][32]

通过上述kfed可以看到第10000 au的位置被写入的是数据库异常之后listener.log的信息(该数据库安装在/home目录中),进一步证明覆盖,通过以下信息证明sdg就是asmdisk1

[root@xff1 dev]# ls -l sdg
brw-rw---- 1 root disk 8,  96 Apr 25 00:05 sdg
[root@xff1 dev]# ls -l asmdisk1
brw-rw---- 1 grid asmadmin 8,  96 Apr 25 00:05 asmdisk1

基于现在的情况,data磁盘组是由三块 200G的磁盘组成,第一块磁盘被意外加入vg,并且写入数据大于30G,无法从asm层面直接通过kfed修复磁盘组,然后直接mount,只能通过oracle asm磁盘数据块重组技术(asm disk header 彻底损坏恢复)实现没有覆盖数据的恢复.
20200426202310


该客户运气还不错,通过仅剩的2019年12月份几天的不成功备份找出来所有的数据文件(无归档),然后强制拉库成功.通过碎片恢复的最新的数据文件数据结合2019年12月份备份,实现绝大部分业务数据恢复,最大限度减少客户损失.对于oracle rac数据库服务器磁盘操作需要谨慎.
如果不幸有类似oracle asm disk被破坏(格式化,dd部分,做成lv等),需要进行恢复支持,可以联系我们,做专业的恢复评估,最大限度,最快速度抢救数据,减少损失
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
恢复过部分asm被格式化案例:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

发表在 非常规恢复 | 标签为 , , , , | 评论关闭

Oracle Exadata坏盘导致磁盘组无法mount恢复

接到朋友求救有客户oracle exadata一体机 的 asm磁盘组无法mount,希望我们提供恢复支持服务
经过分析和了解,大致问题是:磁盘空间已经超容量使用(部分数据不能完成ASM镜像),最近又损坏一块盘,导致asm 磁盘组无法mount。我们分析后,通过重构exadata celldisk数据,将asm 磁盘组 mount成功后,实现五套数据库全部open成功(由于底层磁盘部分数据损坏,导致部分数据访问报错,需要在oracle层面进行处理)。

本次问题的具体分析和处理如下:
存放数据库文件的磁盘组不能mount

Wed Dec 12 21:29:04 2018
SQL> alter diskgroup DATA_XFF mount force 
NOTE: cache registered group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache began mount (first) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
Wed Dec 12 21:29:10 2018
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 101 for pid 27, osid 62541
NOTE: Assigning number (1,0) to disk ()
GMON querying group 1 at 102 for pid 27, osid 62541
NOTE: process _user62541_+asm2 (62541) initiating offline of disk 0.3915937355 () with mask 0x7e[0x7f] in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968764b, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 103 for pid 27, osid 62541
ERROR: Disk 0 cannot be offlined, since all the disks [0, 25] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline of disk 0 () in group 1 and mode 0x7f failed on ASM inst 2
NOTE: cache dismounting (not clean) group 1/0x5FE882CB (DATA_XFF) 
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x5FE882CB (DATA_XFF) 
NOTE: cache ending mount (fail) of group DATA_XFF number=1 incarn=0x5fe882cb
NOTE: cache deleting context for group DATA_XFF 1/0x5fe882cb
GMON dismounting group 1 at 104 for pid 27, osid 62541
ERROR: diskgroup DATA_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "0" in group "DATA_XFF" may result in a data loss
ORA-15042: ASM disk "0" is missing from group number "1" 
ERROR: alter diskgroup DATA_XFF mount force

检查底层损坏情况

CellCLI>   list physicaldisk
         20:0            KN3VZL          normal
         20:1            KNAWLL          normal
         20:2            KN4E4L          warning - predictive failure, poor performance
         20:3            KNAN5L          normal
         20:4            KMJKYL          normal
         20:5            KN5DGL          normal
         20:6            KMDLWL          normal
         20:7            KMDKPL          normal
         20:8            KMDA7L          normal
         20:9            KN1YJL          normal
         20:10           KMH1YL          normal
         20:11           KMVHAL          normal

CellCLI>   list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       proactive failure
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

在db节点无法发现异常磁盘的asm disk

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

根据客户的反馈该磁盘组几乎全部被使用,asmcmd lsdg看到Usable_file_MB已经出现负值.证明该磁盘组本身的normal没有完全存储两份数据,在这样的情况下,继续坏盘会导致部分数据只有一份,因此也就出现了这里的磁盘组无法正常mount成功.

通过底层修复celldisk之后

CellCLI>  list griddisk
         DATA_XFF_CD_00_XFFCEL01       active
         DATA_XFF_CD_01_XFFCEL01       active
         DATA_XFF_CD_02_XFFCEL01       active
         DATA_XFF_CD_03_XFFCEL01       active
         DATA_XFF_CD_04_XFFCEL01       active
         DATA_XFF_CD_05_XFFCEL01       active
         DATA_XFF_CD_06_XFFCEL01       active
         DATA_XFF_CD_07_XFFCEL01       active
         DATA_XFF_CD_08_XFFCEL01       active
         DATA_XFF_CD_09_XFFCEL01       active
         DATA_XFF_CD_10_XFFCEL01       active
         DATA_XFF_CD_11_XFFCEL01       active

[grid@ycdwdb01 grid]$ kfod disk=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
============================================================
   1:     433152 Mb o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01 <unknown> <unknown>
   2:     433152 Mb o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01 <unknown> <unknown>
   3:     433152 Mb o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01 <unknown> <unknown>
   4:     433152 Mb o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01 <unknown> <unknown>
   5:     433152 Mb o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01 <unknown> <unknown>
   6:     433152 Mb o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01 <unknown> <unknown>
   7:     433152 Mb o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01 <unknown> <unknown>
   8:     433152 Mb o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01 <unknown> <unknown>
   9:     433152 Mb o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01 <unknown> <unknown>
  10:     433152 Mb o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01 <unknown> <unknown>
  11:     433152 Mb o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01 <unknown> <unknown>
  12:     433152 Mb o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01 <unknown> <unknown>

data磁盘组直接mount成功

Fri Dec 14 14:04:59 2018
SQL> alter diskgroup DATA_XFF mount 
NOTE: cache registered group DATA_XFF number=1 incarn=0x78a886e7
NOTE: cache began mount (not first) of group DATA_XFF number=1 incarn=0x78a886e7
NOTE: Assigning number (1,36) to disk (o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.5/DATA_XFF_CD_10_XFFCEL03)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03)
NOTE: Assigning number (1,39) to disk (o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03)
NOTE: Assigning number (1,43) to disk (o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03)
NOTE: Assigning number (1,22) to disk (o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02)
NOTE: Assigning number (1,18) to disk (o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02)
NOTE: Assigning number (1,19) to disk (o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02)
NOTE: Assigning number (1,15) to disk (o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02)
NOTE: Assigning number (1,20) to disk (o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02)
NOTE: Assigning number (1,17) to disk (o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02)
NOTE: Assigning number (1,16) to disk (o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02)
NOTE: Assigning number (1,23) to disk (o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02)
NOTE: Assigning number (1,12) to disk (o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02)
NOTE: Assigning number (1,13) to disk (o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02)
NOTE: Assigning number (1,14) to disk (o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02)
NOTE: Assigning number (1,1) to disk (o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01)
NOTE: Assigning number (1,3) to disk (o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01)
NOTE: Assigning number (1,4) to disk (o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01)
NOTE: Assigning number (1,5) to disk (o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01)
NOTE: Assigning number (1,6) to disk (o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01)
NOTE: Assigning number (1,7) to disk (o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01)
NOTE: Assigning number (1,8) to disk (o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01)
NOTE: Assigning number (1,9) to disk (o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01)
NOTE: Assigning number (1,10) to disk (o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01)
NOTE: Assigning number (1,11) to disk (o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01)
NOTE: Assigning number (1,0) to disk (o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01)
Fri Dec 14 14:04:59 2018
GMON querying group 1 at 78 for pid 28, osid 76016
NOTE: Assigning number (1,24) to disk ()
NOTE: Assigning number (1,25) to disk ()
NOTE: Assigning number (1,26) to disk ()
NOTE: Assigning number (1,27) to disk ()
NOTE: Assigning number (1,28) to disk ()
NOTE: Assigning number (1,29) to disk ()
NOTE: Assigning number (1,30) to disk ()
NOTE: Assigning number (1,31) to disk ()
NOTE: Assigning number (1,32) to disk ()
NOTE: Assigning number (1,33) to disk ()
NOTE: Assigning number (1,35) to disk ()
GMON querying group 1 at 79 for pid 28, osid 76016
NOTE: cache opening disk 0 of grp 1: DATA_XFF_CD_02_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_02_XFFCEL01
NOTE: cache opening disk 1 of grp 1: DATA_XFF_CD_05_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_05_XFFCEL01
NOTE: cache opening disk 2 of grp 1: DATA_XFF_CD_03_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_03_XFFCEL01
NOTE: F1X0 found on disk 2 au 5 fcn 0.15948262
NOTE: cache opening disk 3 of grp 1: DATA_XFF_CD_06_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_06_XFFCEL01
NOTE: cache opening disk 4 of grp 1: DATA_XFF_CD_09_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_09_XFFCEL01
NOTE: cache opening disk 5 of grp 1: DATA_XFF_CD_04_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_04_XFFCEL01
NOTE: cache opening disk 6 of grp 1: DATA_XFF_CD_07_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_07_XFFCEL01
NOTE: cache opening disk 7 of grp 1: DATA_XFF_CD_11_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_11_XFFCEL01
NOTE: cache opening disk 8 of grp 1: DATA_XFF_CD_01_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_01_XFFCEL01
NOTE: cache opening disk 9 of grp 1: DATA_XFF_CD_00_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_00_XFFCEL01
NOTE: cache opening disk 10 of grp 1: DATA_XFF_CD_10_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_10_XFFCEL01
NOTE: cache opening disk 11 of grp 1: DATA_XFF_CD_08_XFFCEL01 path:o/192.168.10.3/DATA_XFF_CD_08_XFFCEL01
NOTE: cache opening disk 12 of grp 1: DATA_XFF_CD_00_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_00_XFFCEL02
NOTE: cache opening disk 13 of grp 1: DATA_XFF_CD_01_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_01_XFFCEL02
NOTE: cache opening disk 14 of grp 1: DATA_XFF_CD_02_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_02_XFFCEL02
NOTE: cache opening disk 15 of grp 1: DATA_XFF_CD_03_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_03_XFFCEL02
NOTE: cache opening disk 16 of grp 1: DATA_XFF_CD_04_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_04_XFFCEL02
NOTE: cache opening disk 17 of grp 1: DATA_XFF_CD_05_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_05_XFFCEL02
NOTE: cache opening disk 18 of grp 1: DATA_XFF_CD_06_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_06_XFFCEL02
NOTE: cache opening disk 19 of grp 1: DATA_XFF_CD_07_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_07_XFFCEL02
NOTE: cache opening disk 20 of grp 1: DATA_XFF_CD_08_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_08_XFFCEL02
NOTE: cache opening disk 21 of grp 1: DATA_XFF_CD_09_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_09_XFFCEL02
NOTE: F1X0 found on disk 21 au 2 fcn 0.15948262
NOTE: cache opening disk 22 of grp 1: DATA_XFF_CD_10_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_10_XFFCEL02
NOTE: cache opening disk 23 of grp 1: DATA_XFF_CD_11_XFFCEL02 path:o/192.168.10.4/DATA_XFF_CD_11_XFFCEL02
NOTE: cache opening disk 36 of grp 1: DATA_XFF_CD_11_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_11_XFFCEL03
NOTE: cache opening disk 37 of grp 1: DATA_XFF_CD_04_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_04_XFFCEL03
NOTE: cache opening disk 38 of grp 1: DATA_XFF_CD_00_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_00_XFFCEL03
NOTE: cache opening disk 39 of grp 1: DATA_XFF_CD_03_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_03_XFFCEL03
NOTE: cache opening disk 40 of grp 1: DATA_XFF_CD_05_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_05_XFFCEL03
NOTE: cache opening disk 41 of grp 1: DATA_XFF_CD_08_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_08_XFFCEL03
NOTE: cache opening disk 42 of grp 1: DATA_XFF_CD_01_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_01_XFFCEL03
NOTE: cache opening disk 43 of grp 1: DATA_XFF_CD_09_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_09_XFFCEL03
NOTE: cache opening disk 44 of grp 1: DATA_XFF_CD_06_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_06_XFFCEL03
NOTE: F1X0 found on disk 44 au 2 fcn 0.15948262
NOTE: cache opening disk 45 of grp 1: DATA_XFF_CD_07_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_07_XFFCEL03
NOTE: cache opening disk 46 of grp 1: DATA_XFF_CD_02_XFFCEL03 path:o/192.168.10.5/DATA_XFF_CD_02_XFFCEL03
NOTE: cache mounting (not first) normal redundancy group 1/0x78A886E7 (DATA_XFF)
Fri Dec 14 14:04:59 2018
kjbdomatt send to inst 2
Fri Dec 14 14:04:59 2018
NOTE: attached to recovery domain 1
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Fri Dec 14 14:04:59 2018
NOTE: LGWR attempting to mount thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR found thread 2 closed at ABA 98.4672
NOTE: LGWR mounted thread 2 for diskgroup 1 (DATA_XFF)
NOTE: LGWR opening thread 2 at fcn 0.18931129 ABA 99.4673
NOTE: cache mounting group 1/0x78A886E7 (DATA_XFF) succeeded
NOTE: cache ending mount (success) of group DATA_XFF number=1 incarn=0x78a886e7
GMON querying group 1 at 80 for pid 19, osid 9805
Fri Dec 14 14:04:59 2018
NOTE: Instance updated compatible.asm to 11.2.0.3.0 for grp 1
SUCCESS: diskgroup DATA_XFF was mounted
SUCCESS: alter diskgroup DATA_XFF mount

恢复后的asm磁盘状态

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  4194304  15160320  4776184          5197824         -210820             12             N  DATA_XFF/
MOUNTED  NORMAL  N         512   4096  4194304    864896   863400           298240          282580              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3787840  2157232          1298688          429272              0             N  RECO_XFF/

后续数据库open成功,有部分坏块通过技术手段进行二次处理,至此数据库恢复完成,成功抢救了客户Oracle Exadata中的绝大部分数据.如果有类似xd故障恢复,无法自行解决,需要恢复支持请联系我们
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 非常规恢复 | 标签为 , , , , , , , , | 评论关闭

ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复

接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复

Mon Sep 21 19:52:35 2015
SQL> alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012 
NOTE: Assigning number (1,20) to disk (/dev/rhdisk116)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DG_XFF_0020
NOTE: requesting all-instance disk validation for group=1
Mon Sep 21 19:52:44 2015
NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
NOTE: initiating PST update: grp = 1
Mon Sep 21 19:52:44 2015
GMON updating group 1 at 25 for pid 27, osid 12124486
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0xb94738f1 (DG_XFF)
GMON querying group 1 at 26 for pid 18, osid 10092734
NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path:/dev/rhdisk116
GMON querying group 1 at 27 for pid 18, osid 10092734
SUCCESS: refreshed membership for 1/0xb94738f1 (DG_XFF)
Mon Sep 21 19:52:47 2015
SUCCESS: alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: starting rebalance of group 1/0xb94738f1 (DG_XFF) at power 1
Starting background process ARB0
Mon Sep 21 19:52:47 2015
ARB0 started with pid=28, OS id=10944804 
NOTE: assigning ARB0 to group 1/0xb94738f1 (DG_XFF) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DG_XFF
Mon Sep 21 20:35:06 2015
SQL> ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */ 
NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: Assigning number (1,0) to disk (/dev/rhdisk10)
NOTE: Assigning number (1,1) to disk (/dev/rhdisk11)
NOTE: Assigning number (1,2) to disk (/dev/rhdisk16)
NOTE: Assigning number (1,3) to disk (/dev/rhdisk17)
NOTE: Assigning number (1,4) to disk (/dev/rhdisk22)
NOTE: Assigning number (1,5) to disk (/dev/rhdisk23)
NOTE: Assigning number (1,6) to disk (/dev/rhdisk28)
NOTE: Assigning number (1,7) to disk (/dev/rhdisk29)
NOTE: Assigning number (1,8) to disk (/dev/rhdisk33)
NOTE: Assigning number (1,9) to disk (/dev/rhdisk34)
NOTE: Assigning number (1,10) to disk (/dev/rhdisk4)
NOTE: Assigning number (1,11) to disk (/dev/rhdisk40)
NOTE: Assigning number (1,12) to disk (/dev/rhdisk41)
NOTE: Assigning number (1,13) to disk (/dev/rhdisk45)
NOTE: Assigning number (1,14) to disk (/dev/rhdisk46)
NOTE: Assigning number (1,15) to disk (/dev/rhdisk5)
NOTE: Assigning number (1,16) to disk (/dev/rhdisk52)
NOTE: Assigning number (1,17) to disk (/dev/rhdisk53)
NOTE: Assigning number (1,18) to disk (/dev/rhdisk57)
NOTE: Assigning number (1,19) to disk (/dev/rhdisk58)
Wed Sep 30 11:08:07 2015
NOTE: start heartbeating (grp 1)
GMON querying group 1 at 33 for pid 35, osid 4194488
NOTE: Assigning number (1,20) to disk ()
GMON querying group 1 at 34 for pid 35, osid 4194488
NOTE: cache dismounting (clean) group 1/0xDD6F975A (DG_XFF) 
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xDD6F975A (DG_XFF) 
NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache deleting context for group DG_XFF 1/0xdd6f975a
GMON dismounting group 1 at 35 for pid 35, osid 4194488
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "20" is missing from group number "1" 
ERROR: ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */

这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息

grp# dsk# bsize ausize disksize diskname        groupname       path
---- ---- ----- ------ -------- --------------- --------------- -------------
   1    0  4096  4096K   179200 DG_XFF_0000     DG_XFF          /dev/rhdisk10
   1    1  4096  4096K   179200 DG_XFF_0001     DG_XFF          /dev/rhdisk11
   1    2  4096  4096K   179200 DG_XFF_0002     DG_XFF          /dev/rhdisk16
   1    3  4096  4096K   179200 DG_XFF_0003     DG_XFF          /dev/rhdisk17
   1    4  4096  4096K   179200 DG_XFF_0004     DG_XFF          /dev/rhdisk22
   1    5  4096  4096K   179200 DG_XFF_0005     DG_XFF          /dev/rhdisk23
   1    6  4096  4096K   179200 DG_XFF_0006     DG_XFF          /dev/rhdisk28
   1    7  4096  4096K   179200 DG_XFF_0007     DG_XFF          /dev/rhdisk29
   1    8  4096  4096K   179200 DG_XFF_0008     DG_XFF          /dev/rhdisk33
   1    9  4096  4096K   179200 DG_XFF_0009     DG_XFF          /dev/rhdisk34
   1   10  4096  4096K   179200 DG_XFF_0010     DG_XFF          /dev/rhdisk4
   1   11  4096  4096K   179200 DG_XFF_0011     DG_XFF          /dev/rhdisk40
   1   12  4096  4096K   179200 DG_XFF_0012     DG_XFF          /dev/rhdisk41
   1   13  4096  4096K   179200 DG_XFF_0013     DG_XFF          /dev/rhdisk45
   1   14  4096  4096K   179200 DG_XFF_0014     DG_XFF          /dev/rhdisk46
   1   15  4096  4096K   179200 DG_XFF_0015     DG_XFF          /dev/rhdisk5
   1   16  4096  4096K   179200 DG_XFF_0016     DG_XFF          /dev/rhdisk52
   1   17  4096  4096K   179200 DG_XFF_0017     DG_XFF          /dev/rhdisk53
   1   18  4096  4096K   179200 DG_XFF_0018     DG_XFF          /dev/rhdisk57
   1   19  4096  4096K   179200 DG_XFF_0019     DG_XFF          /dev/rhdisk58

这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
RECOVER


从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%,对于丢失了一个已经rebalance的大部分的lun,依旧能够恢复如此的数据,整体看非常理想.
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 Oracle ASM | 标签为 , , , , | 评论关闭