分类目录归档:非常规恢复

rm -rf误删Oracle数据库恢复

有客户把虚拟化环境中装有oracle数据库的linux操作系统,由于操作失误在/下面执行了rm -rf *,导致所有文件被删除,系统无法启动.客户希望要求恢复出其中的Oracle数据库.由于是虚拟化环境,然后客户直接从虚拟化平台下载下来磁盘文件,通过工具加载和分析确认是一个xfs的文件系统
20240516190746


使用工具进一步扫描分析,找到部分数据文件
20240516190505

这里可以获取到两个信息:
1. 尝试恢复oracle的control01.ctl文件,然后通过该文件尝试分析其他数据文件位置,运气不错该文件恢复出来是好的,直接加载到新库查询v$datafile,分析出来所有数据文件信息
2. 这里有一个非常不幸的信息,oracle最核心的system01.dbf文件大小明显异常,进一步分析该文件信息,结论是该文件无法通过反删除方式进行恢复
20240515223607

先把可以os层面可以恢复的数据恢复出来,并且检查坏块情况
20240516191836

对于异常的system文件,有两个处理方法:
1. 通过阅览被删除的文件,发现客户有5月14日1点左右的rman备份,通过恢复软件中完整度提示,大概率应该没有什么问题,但是分析发现部分归档日志损坏无法完整恢复
2. 通过对磁盘做碎片,恢复出来该数据文件,参考以往文章:
dbca删除库和rm删库恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
通过这个方法运气不错,恢复出来该库的system01.dbf文件非常完整0丢失

[oracle@localhost oradata]$ dbv file=system01.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Thu May 15 23:26:57 2024

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/oradata/system01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 199680
Total Pages Processed (Data) : 113988
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 26869
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 40253
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 18570
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 658228557 (0.658228557)

完成上述恢复工作之后,目前确认只有sysaux01.dbf有8026个block损坏,但是该表空间不涉及业务数据,尝试在新的系统中直接修改路径并open库

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-38760: This database instance failed to turn on flashback database


SQL> alter database flashback off;

Database altered.

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

运气不错,数据库直接open成功,现在处理sysaux01.dbf中的损坏文件:
1. 确认该文件具体坏块开始位置:
20240516193650


2. 由于坏块在文件中比较靠后,分析实际存储数据最后的位置

SQL> select max(block_id+blocks) from dba_extents where file_id=3;

MAX(BLOCK_ID+BLOCKS)
--------------------
             3493120

最后存储数据的位置小于坏块的位置,证明坏块部分是没有存储数据的,直接resize掉坏块部分

SQL> alter database datafile '/u01/oradata/sysaux01.dbf' resize 27290m;

Database altered.

然后dbv该数据文件,确认没有任何问题

[oracle@localhost trace]$ dbv file=/u01/oradata/sysaux01.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Wed May 15 22:43:00 2024

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/oradata/sysaux01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 3493120
Total Pages Processed (Data) : 1516833
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1868832
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 56577
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 32107
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 18771
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 658223915 (0.658223915)

使用rman检测全库,也确定没有任何问题

[oracle@localhost trace]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 15 22:43:58 2024
Version 19.15.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: XIFENFEI (DBID=2912535091)

RMAN> 

RMAN> 

RMAN> backup validate check logical database skip inaccessible;

Starting backup at 15-MAY-24
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=43 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=278 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
………………
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
32   OK     0              6273         6400            370625094 
  File Name: /u01/oradata/xff_com.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0               
  Index      0              0               
  Other      0              127             

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
33   OK     0              163752       832000          627920639 
  File Name: /u01/oradata/XFF_DATA_202312231.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              374296          
  Index      0              285002          
  Other      0              8950            

Finished backup at 15-MAY-24

[oracle@localhost trace]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed May 15 22:47:44 2024
Version 19.15.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL> select * from v$database_block_corruption ;

no rows selected

SQL> 

至此对于这次rm -rf /*的故障实现了Oracle数据库完美恢复,数据0丢失.

发表在 非常规恢复 | 标签为 , , | 评论关闭

asm disk被加入到另外一个磁盘组故障恢复

有朋友在aix环境对其中一个rac的asm磁盘组进行扩容
add_disk


之后另外一套rac的磁盘组直接dismount

Wed Aug 23 12:44:02 2023
NOTE: SMON starting instance recovery for group DATA domain 2 (mounted)
NOTE: F1X0 found on disk 0 au 2 fcn 0.128808679
NOTE: SMON skipping disk 7 - no header
NOTE: cache initiating offline of disk 7 group DATA
NOTE: process _smon_+asm1 (1770932) initiating offline of disk 7.3422955792 (DATA_0007) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 7/0xcc062910, mask = 0x6a, op = clear
Wed Aug 23 12:44:02 2023
GMON updating disk modes for group 2 at 7 for pid 17, osid 1770932
ERROR: Disk 7 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Wed Aug 23 12:44:02 2023
NOTE: cache dismounting (not clean) group 2/0x7FE6D808 (DATA) 
WARNING: Offline for disk DATA_0007 in mode 0x7f failed.
Wed Aug 23 12:44:02 2023
NOTE: halting all I/Os to diskgroup 2 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 2 due to diskgroup dismount
NOTE: SMON skipping lock domain (2) validation because diskgroup being dismounted
Abort recovery for domain 2
Wed Aug 23 12:44:02 2023
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7fe6d808 (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_2360526.trc:
ORA-15130: diskgroup "DATA" is being dismounted

再次尝试mount该磁盘组,报ORA-15042和ORA-15038错误

SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=2 incarn=0x79e6d861
NOTE: cache began mount (first) of group DATA number=2 incarn=0x79e6d861
NOTE: Assigning number (2,0) to disk (/dev/rhdisk31)
NOTE: Assigning number (2,3) to disk (/dev/rhdisk33)
NOTE: Assigning number (2,4) to disk (/dev/rhdisk34)
NOTE: Assigning number (2,5) to disk (/dev/rhdisk35)
NOTE: Assigning number (2,6) to disk (/dev/rhdisk36)
NOTE: Assigning number (2,9) to disk (/dev/rhdisk39)
NOTE: Assigning number (2,1) to disk (/dev/rhdisk8)
NOTE: Assigning number (2,2) to disk (/dev/rhdisk9)
Wed Aug 23 12:58:46 2023
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 11 for pid 27, osid 3736034
NOTE: Assigning number (2,7) to disk ()
NOTE: Assigning number (2,8) to disk ()
GMON querying group 2 at 12 for pid 27, osid 3736034
NOTE: cache dismounting (clean) group 2/0x79E6D861 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3736034, image: oracle@hbbz01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x79E6D861 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x79e6d861
NOTE: cache deleting context for group DATA 2/0x79e6d861
GMON dismounting group 2 at 13 for pid 27, osid 3736034
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "2" 
ORA-15042: ASM disk "7" is missing from group number "2" 
ORA-15038: disk '/dev/rhdisk37' mismatch on 'Time Stamp' with target disk group [2129689239] [2062898314]
ERROR: alter diskgroup data mount

怀疑把报错这个磁盘组的rhdisk37加入到另外一套rac的asm中了(也就是说两套asm使用了同一块磁盘),aix操作系统层面分析确认

---对asm扩容的机器上
# lscfg -vpl hdisk15
  hdisk15          U78C5.001.DQD076A-P2-C4-T1-W200C00A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

---磁盘组dismount的机器上
# lscfg -vpl hdisk37      
  hdisk37          U5802.001.9K87776-P1-C1-T1-W200500A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

通过lscfg 命令确认两套rac使用了同一块盘导致一个磁盘组异常,在新加的机器上查询确认新盘被破坏情况(新加入的磁盘由于reblance操作,已经被写入了380G左右数据[也就意味着这个磁盘在老磁盘组中最少会丢失380G数据]
20230905140603


对于这种情况,dismount磁盘组是外部冗余不可能直接mount起来,只能通过以前处理的类似方法:
asm disk header 彻底损坏恢复
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm disk 磁盘部分被清空恢复
再一例asm disk被误加入vg并且扩容lv恢复
fdisk分区导致asm disk破坏数据库恢复
再一起asm disk被格式化成ext3文件系统故障恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复
通过底层处理恢复出来没有覆盖的数据块中数据
20230827200941

再使用dul恢复出来其中数据,完成这次故障的核心数据恢复

发表在 非常规恢复 | 标签为 , , , | 评论关闭

win系统删除oracle数据文件恢复

有客户联系我们,说win平台下的数据库,在由于空间紧张,在关闭数据库的情况下删除的两个数据文件,导致数据库无法正常访问很多业务表,需要对其进行恢复,查看alert日志发现大概操作,删除文件之后,启动数据库失败

Completed: alter database mount exclusive
alter database open
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_4060.trc:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: 'D:\DATASPACE\XXXXX.DBF'
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_4060.trc:
ORA-01157: cannot identify/lock data file 38 - see DBWR trace file
ORA-01110: data file 38: 'D:\DATASPACE\XXXXX24.DBF'
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Tue Jun 27 14:50:28 2023
Checker run found 2 new persistent data failures

人工创建被删除文件,启动库报ORA-27047,OSD-04006等错误

Tue Jun 27 16:45:10 2023
ALTER DATABASE OPEN
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_5456.trc:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: 'D:\DATASPACE\XXXXX.DBF'
ORA-27047: unable to read the header block of file
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。

offline相关数据文件,启动库成功,但是job开始报错

Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_5456.trc:
ORA-01157: cannot identify/lock data file 38 - see DBWR trace file
ORA-01110: data file 38: 'D:\DATASPACE\XXXXX24.DBF'
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_648.trc:
ORA-01157: 无法标识/锁定数据文件 6 - 请参阅 DBWR 跟踪文件
ORA-01110: 数据文件 6: 'D:\DATASPACE\XXXXX.DBF'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Tue Jun 27 16:48:43 2023
alter database datafile 'D:\DATASPACE\XXXXX.DBF' offline drop
Completed: alter database datafile 'D:\DATASPACE\XXXXX.DBF' offline drop
Tue Jun 27 16:49:08 2023
alter database open
Tue Jun 27 16:49:08 2023
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_5456.trc:
ORA-01157: cannot identify/lock data file 38 - see DBWR trace file
ORA-01110: data file 38: 'D:\DATASPACE\XXXXX24.DBF'
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3976.trc:
ORA-01157: 无法标识/锁定数据文件 38 - 请参阅 DBWR 跟踪文件
ORA-01110: 数据文件 38: 'D:\DATASPACE\XXXXX24.DBF'
ORA-1157 signalled during: alter database open...
Tue Jun 27 16:49:08 2023
Checker run found 1 new persistent data failures
Tue Jun 27 16:49:28 2023
alter database datafile 'D:\DATASPACE\XXXXX24.DBF' offline drop
Completed: alter database datafile 'D:\DATASPACE\XXXXX24.DBF' offline drop
alter database open
Tue Jun 27 16:49:37 2023
Thread 1 opened at log sequence 145929
  Current log# 3 seq# 145929 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jun 27 16:49:37 2023
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Tue Jun 27 16:49:39 2023
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Tue Jun 27 16:49:40 2023
QMNC started with pid=21, OS id=6096 
Completed: alter database open
Tue Jun 27 16:49:43 2023
db_recovery_file_dest_size of 4096 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 27 16:49:44 2023
Starting background process CJQ0
Tue Jun 27 16:49:44 2023
CJQ0 started with pid=142, OS id=6036 
Tue Jun 27 16:49:48 2023
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j007_5184.trc:
ORA-12012: 自动执行作业 64 出错
ORA-00376: 此时无法读取文件 6
ORA-01110: 数据文件 6: 'D:\DATASPACE\XXXXX.DBF'
ORA-06512: 在 "XIFENFEI.XXXXXXXX", line 2897
ORA-06512: 在 line 1
Tue Jun 27 16:51:52 2023
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_2548.trc:
ORA-12012: 自动执行作业 64 出错
ORA-00376: 此时无法读取文件 6
ORA-01110: 数据文件 6: 'D:\DATASPACE\XXXXX.DBF'
ORA-06512: 在 "XIFENFEI.XXXXXXXX", line 2897
ORA-06512: 在 line 1
Tue Jun 27 16:54:44 2023
Starting background process SMCO
Tue Jun 27 16:54:44 2023
SMCO started with pid=42, OS id=908 
Tue Jun 27 16:55:52 2023

接手现场之后,关闭数据库,使用操作系统层面反删除工具进行扫描恢复,发现其中一个文件(另外一个文件os层面无法恢复)
20230707132040


通过工具检测恢复出来的数据文件,损坏的几个block是文件头部不涉及业务数据,运气不错
20230707135054

另外一个数据文件,从os层面无法恢复,对于这种情况,只能基于底层的block层面进行恢复(恢复没有覆盖的block)
20230707150912
参考类似恢复案例:
win文件系统损坏oracle恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
分享运气超级好的一次drop tablespace 数据恢复
恢复出来的两个数据文件,结合该编辑的其他数据文件通过dul工具恢复其中数据,最大程度抢救客户数据,减少损失.

发表在 Oracle备份恢复, 非常规恢复 | 标签为 , , | 评论关闭