分类目录归档:Oracle ASM

asm disk被分区,格式化为ext4恢复

有客户因为没有认识到linux中的磁盘被asm使用,对其进行分区并且做成了ext4的文件系统,从history中获取客户操作命令

  600  fdisk -l
  601  fdisk /dev/sdb
  602  mkfs ext4 /dev/sdb1 
  603  fdisk -l
  604  mkfs -t ext4 /dev/sdb1 
  605  cd /
  606  mkdir u01
  607  mount /dev/sdb1 /u01
  608  df -h

确认磁盘情况,确认sdb直接被asm磁盘使用(asmdisk1)

[grid@racdb3 trace]$ ls -l /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Sep 30 14:34 /dev/asmdisk1
[grid@racdb3 trace]$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jul 27  2021 /dev/sda
brw-rw---- 1 root disk 8,  1 Jul 27  2021 /dev/sda1
brw-rw---- 1 root disk 8,  2 Jul 27  2021 /dev/sda2
brw-rw---- 1 root disk 8, 16 Sep 30 11:23 /dev/sdb
brw-rw---- 1 root disk 8, 17 Sep 30 11:23 /dev/sdb1
brw-rw---- 1 root disk 8, 32 Jul 27  2021 /dev/sdc

asm日志报错

Fri Sep 30 11:31:41 2022
NOTE: SMON starting instance recovery for group DATA domain 1 (mounted)
NOTE: SMON skipping disk 0 - no header
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _smon_+asm3 (2989) initiating offline of disk 0.3915953109 (DATA_0000) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968b3d5, mask = 0x6a, op = clear
Fri Sep 30 11:31:41 2022
GMON updating disk modes for group 1 at 4 for pid 17, osid 2989
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Fri Sep 30 11:31:41 2022
NOTE: cache dismounting (not clean) group 1/0x34F84324 (DATA) 
WARNING: Offline for disk DATA_0000 in mode 0x7f failed.
Fri Sep 30 11:31:41 2022
NOTE: halting all I/Os to diskgroup 1 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 1 due to diskgroup dismount
NOTE: SMON skipping lock domain (1) validation because diskgroup being dismounted

数据库日志报错

Fri Sep 30 11:31:44 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_lmon_26356.trc:
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
Fri Sep 30 11:31:45 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
CKPT (ospid: 26388): terminating the instance due to error 221

通过kfed 查看asm disk被破坏情况

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1 
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F4FAAD45400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
7F4FAAD455B0 00000000 00000000 45C222C8 01000000  [.........".E....]
7F4FAAD455C0 FE830001 003FFFFF E9D60000 0000FFFF  [......?.........]
7F4FAAD455D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
7F4FAAD455F0 00000000 00000000 00000000 AA550000  [..............U.]
7F4FAAD45600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=2
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F64E77A0400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F64E77A1200 000081F9 000181F9 000281F9 000381F9  [................]
7F64E77A1210 000481F9 000C81F9 000D81F9 001881F9  [................]
7F64E77A1220 002881F9 003E81F9 007981F9 00AB81F9  [..(...>...y.....]
7F64E77A1230 013881F9 016C81F9 044581F9 04B081F9  [..8...l...E.....]
7F64E77A1240 061A81F9 0CD081F9 1E8481F9 00000000  [................]
7F64E77A1250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F8D101FF400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F8D10200200 000082F9 000182F9 000282F9 000382F9  [................]
7F8D10200210 000482F9 000C82F9 000D82F9 001882F9  [................]
7F8D10200220 002882F9 003E82F9 007982F9 00AB82F9  [..(...>...y.....]
7F8D10200230 013882F9 016C82F9 044582F9 04B082F9  [..8...l...E.....]
7F8D10200240 061A82F9 0CD082F9 1E8482F9 00000000  [................]
7F8D10200250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=4
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F142949C400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F142949D200 000083F9 000183F9 000283F9 000383F9  [................]
7F142949D210 000483F9 000C83F9 000D83F9 001883F9  [................]
7F142949D220 002883F9 003E83F9 007983F9 00AB83F9  [..(...>...y.....]
7F142949D230 013883F9 016C83F9 044583F9 04B083F9  [..8...l...E.....]
7F142949D240 061A83F9 0CD083F9 1E8483F9 00000000  [................]
7F142949D250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=5
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F0615CF6400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

磁盘前几个au被破坏严重.而且相关的备份block都已经损坏,基于这种情况,直接参考:
asm磁盘dd破坏恢复
asm disk header 彻底损坏恢复
asm disk 磁盘部分被清空恢复
通过底层恢复出来相关数据文件,并检测正常
20221002105544


进一步通过au分配列表获恢复redo,ctl等文件

H:\TEMP\asm-ext4\other>dir 
 驱动器 H 中的卷是 SSD-SX
 卷的序列号是 84EB-F434

 H:\TEMP\asm-ext4\other 的目录

2022-09-30  21:52        25,165,824 256.dd
2022-09-30  21:52        25,165,824 257.dd
2022-09-30  23:52        52,429,312 258.dd.1
2022-09-30  23:54        52,429,312 259.dd.1
2022-09-30  23:55        52,429,312 260.dd.1
2022-09-30  23:55        52,429,312 261.dd.1
2022-09-30  23:56        52,429,312 270.dd.1
2022-09-30  23:57        52,429,312 271.dd.1
2022-09-30  23:57        52,429,312 272.dd.1
2022-09-30  23:57        52,429,312 273.dd.1
2022-09-30  23:58        52,429,312 274.dd.1
2022-10-01  00:01        52,429,312 275.dd.1
2022-10-01  00:00        52,429,312 276.dd.1
2022-10-01  00:00        52,429,312 277.dd.1
2022-10-01  00:00        52,429,312 278.dd.1
2022-09-30  23:59        52,429,312 279.dd.1
2022-09-30  23:59        52,429,312 280.dd.1
2022-09-30  23:59        52,429,312 281.dd.1

在另外的新机器上尝试恢复库

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sat Oct 1 10:18:58 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
ORA-00227: corrupt block detected in control file: (block 8, # blocks 1)
ORA-00202: control file: '/oradata/256.dd'

控制文件损坏,重建ctl

SQL> CREATE CONTROLFILE REUSE DATABASE "xifenfei" NORESETLOGS  NOARCHIVELOG  
  2      MAXLOGFILES 50  
  3      MAXLOGMEMBERS 5  
  4      MAXDATAFILES 100  
  5      MAXINSTANCES 8  
  6      MAXLOGHISTORY 226  
  7  LOGFILE  
  8  group 7  '/oradata/270.dd.1' size 50M,
  9  group 8  '/oradata/272.dd.1' size 50M,
 10  group 5  '/oradata/274.dd.1' size 50M,
 11  group 6  '/oradata/276.dd.1' size 50M,
 12  group 3  '/oradata/278.dd.1' size 50M,
 13  group 4  '/oradata/280.dd.1' size 50M,
 14  group 1  '/oradata/258.dd.1' size 50M,
 15  group 2  '/oradata/260.dd.1' size 50M
 16  DATAFILE  
 17  '/oradata/1',
 18  '/oradata/2',
 19  '/oradata/3',
 20  '/oradata/4',
 21  '/oradata/5',
 22  '/oradata/6',
 23  '/oradata/7',
 24  '/oradata/8',
 25  '/oradata/9',
 26  '/oradata/10',
 27  '/oradata/11'
 28  CHARACTER SET ZHS16GBK  
 29  ;  

Control file created.

尝试open库,报ORA-600 kqfidps_update_stats:2,ORA-600 4194等错误

SQL> recover database;
Media recovery complete.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kqfidps_update_stats:2],
[0x7FFCCBEB3EC0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4193], [19319], [l.ok

解决该异常,open数据库成功

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
Database mounted.
SQL> alter database open;

Database altered.

导出数据库,遭遇个别表如下ORA-08103和ORA-01555两种错误,这种是由于个别block在做成文件系统的时候被损坏,底层恢复的时候block被置空导致,对其异常表进行单独处理即可

. . 正在导出表                           ALBUM
EXP-00056: 遇到 ORACLE 错误 8103
ORA-08103: 对象不再存在

. . 正在导出表                  M_PUSH_CONTENT
EXP-00056: 遇到 ORACLE 错误 1555
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
ORA-22924: 快照太旧

通过上述操作,实现客户数据的恢复,最大限度挽回客户损坏,再次提醒对于asm disk进行了误操作,建议第一时间保护现场(不要有任何的写入操作,可以最大限度恢复数据)

发表在 Oracle ASM, 非常规恢复 | 标签为 , , , , , , , | 留下评论

ORA-15335 ORA-15130 ORA-15066 ORA-15196

客户反馈,数据库无法正常启动,通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉

Mon Sep 26 16:40:14 2022
SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT  
NOTE: cache registered group DATA number=2 incarn=0x9dfa705f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705f
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
Mon Sep 26 16:40:20 2022
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 68 for pid 25, osid 14650
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)
Mon Sep 26 16:40:20 2022
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Mon Sep 26 16:40:20 2022
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.321845
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Sep 26 16:40:20 2022
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 20.3546
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547
NOTE: cache mounting group 2/0x9DFA705F (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705f
Mon Sep 26 16:40:20 2022
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT 
Mon Sep 26 16:40:22 2022
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Mon Sep 26 16:40:47 2022
NOTE: client xff1:xff registered, osid 14742, mbr 0x0
Mon Sep 26 16:40:57 2022
WARNING: cache read  a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) 
incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 
disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process _user14778_+asm1 (14778) initiating offline of 
disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clear
Mon Sep 26 16:40:58 2022
GMON updating disk modes for group 2 at 70 for pid 28, osid 14778
ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Mon Sep 26 16:40:58 2022
NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) 
WARNING: Offline for disk DATA_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)
Mon Sep 26 16:40:58 2022
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc  (incident=144548):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
Sweep [inc][144548]: completed
System State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548
Mon Sep 26 16:41:00 2022
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550
Mon Sep 26 16:41:00 2022
Sweep [inc2][144548]: completed
Mon Sep 26 16:41:00 2022
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:
ORA-15130: diskgroup "DATA" is being dismounted

这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)

数据库alert日志报错

Mon Sep 26 16:40:52 2022
Successful mount of redo thread 1, with mount id 1097279951
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: alter database mount
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
LGWR: STARTING ARCH PROCESSES
Mon Sep 26 16:40:56 2022
ARC0 started with pid=40, OS id=14761 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Sep 26 16:40:57 2022
ARC1 started with pid=41, OS id=14764 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 1 (???? 1) ???
Mon Sep 26 16:40:57 2022
ARC2 started with pid=42, OS id=14766 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 2 (???? 1) ???
Mon Sep 26 16:40:57 2022
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Mon Sep 26 16:40:57 2022
ARC3 started with pid=44, OS id=14770 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc  (incident=180281):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Unable to create archive log file '+DATA'
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
*************************************************************
WARNING: A file of type ARCHIVED LOG may exist in
db_recovery_file_dest that is not known to the database.
Use the RMAN command CATALOG RECOVERY AREA to re-catalog
any such files. If files cannot be cataloged, then manually
delete them using OS command. This is most likely the
result of a crash during file creation.
*************************************************************
ARCH: Error 19504 Creating archive log file to '+DATA'
NOTE: Deferred communication with ASM instance
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 23
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-16038: log 1 sequence# 14235 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'
Mon Sep 26 16:40:58 2022
Sweep [inc][180281]: completed
Sweep [inc2][180281]: completed
USER (ospid: 14732): terminating the instance due to error 16038
Mon Sep 26 16:40:59 2022
System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].
Instance terminated by USER, pid = 14732

对于这类故障处理相对比较容易,通过patch asm,让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复

发表在 Oracle ASM | 标签为 , , , , , | 留下评论

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , | 评论关闭