_allow_resetlogs_corruption 的搜索结果

_ALLOW_RESETLOGS_CORRUPTION

我相信_ALLOW_RESETLOGS_CORRUPTION 这个参数一定很多人都熟悉,是redo异常恢复的杀手锏之一,以下文章是来自官方的解释

DB_Parameter _ALLOW_RESETLOGS_CORRUPTION 
========================================
 
This documentation has been prepared avoiding the mention of the complex 
structures from the code and to simply give an insight to the 'damage it could 
cause'.  The usage of this parameter leads to an in-consistent Database with no 
other alternative but to rebuild the complete Database.  This parameter could 
be used when we realize that there are no stardard options available and are 
convinced that the customer understands the implications of using the Oracle's 
secret parameter.  The factors to be considered are ;-- 
 
1. Customer does not have a good backup. 
2. A lot of time and money has been invested after the last good backup and     
   there is no possibility for reproduction of the lost data. 
3. The customer has to be ready to export the full database and import it     
   back after creating a new one. 
4. There is no 100% guarantee that by using this parameter the database would 
   come up. 
5. Oracle does not support the database after using this parameter for        
   recovery.    
6. ALL OPTIONS including the ones mentioned in the action part of the error   
   message have been tried. 
 
 

 
By setting _ALLOW_RESETLOGS_CORRUPTION=TRUE, certain consistency checks are 
SKIPPED during database open stage.  This basically means it does not check 
the datafile headers as to what the status was before the shutdown and how it 
was shutdown.  The following cases mention few of the checks that were skipped. 
 
Case-I 
------ 
Verification that the datafile present has not been restored from a BACKUP 
taken before the database was opened successfully by using RESETLOGS.   

ORA-01190: control file or data file %s is from before the last RESETLOGS
    Cause: Attempting to use a data file when the log reset information in  
           the file does not match the control file.  Either the data file or  
           the control file is a backup that was made before the most recent 
           ALTER DATABASE OPEN RESETLOGS. 
   Action: Restore file from a more recent backup. 

 
Case-II 
------- 
Verification that the status bit of the datafile is not in a FUZZY state. 
The datafile could be in this state due to the database going down when the  
 - Datafile was on-line and open 
 - Datafile was not closed cleanly (maybe due to OS). 

ORA-01194: file %s needs more recovery to be consistent 
    Cause: An incomplete recover session was started, but an insufficient 
           number of logs were applied to make the file consistent.  The  
           reported file was not closed cleanly when it was last opened by 
           the database.  It must be recovered to a time when it was not  
           being updated.  The most likely cause of this error is forgetting 
           to restore the file from a backup before doing incomplete  
           recovery. 
   Action: Either apply more logs until the file is consistent or restore the 
           file from an older backup and repeat recovery. 
 

Case-III 
-------- 
Verification that the COMPLETE recover strategies have been applied for 
recovering the datafile and not any of the INCOMPLETE recovery options.  
Basically because the complete recovery is one in which we even apply the 
ON-LINE redo log files and open the DB without reseting the logs. 

ORA-01113: file '%s' needs media recovery starting at log sequence # %s 
    Cause: An attempt was made to open a database file that is in need of  
           media recovery. 
   Action: First apply media recovery to the file. 

 
Case-IV 
------- 
Verification that the datafile has been recovered through an END BACKUP if the 
control file indicates that it was in backup mode. 
This is useful when the DB has crashed while in hot backup mode and we lost 
all log files in DB version's less than V7.2. 

ORA-01195: on-line backup of file %s needs more recovery to be consistent" 
    Cause: An incomplete recovery session was started, but an insufficient  
           number of logs were applied to make the file consistent.  The 
           reported file is an on-line backup which must be recovered to the 
           time the backup ended. 
   Action: Either apply more logs until the file is consistent or resotre 
          the database files from an older backup and repeat recovery. 
 
In version 7.2, we could simply issue the ALTER DATABASE DATAFILE xxxx END 
BACKUP statement and proceed with the recovery.  But again to issue this 
statement, we need to have the ON-LINE redo logs or else we still are forced to
use this parameter. 
 

Case-V 
------ 
Verification that the data file status is not still in (0x10) MEDIA recovery 
FUZZY. 
When recovery is started, a flag is set in the datafile header status flag to 
indicate that the file is presently in media recovery.  This is reset when 
recovery is completed and at times when it has not been reset we are forced to 
use this paramter. 

ORA-01196: file %s is inconsistent due to a failed media recovery session 
    Cause: The file was being recovered but the recovery did not terminate 
           normally.  This left the file in an inconsistent state.  No more  
           recovery was successfully completed on this file. 
   Action: Either apply more logs until the file is consistent or restore the 
           backup again and repeat recovery. 
 
 
Case-VI 
------- 
Verification that the datafile has been restored form a proper backup to 
correspond with the log files.  This situation could happen when we have 
decided that the data file is invalid since its SCN is ahead of the last 
applied logs SCN but it has not failed on one of the ABOVE CHECKS. 

ORA-01152: file '%s' was not restored from a sufficientluy old backup" 
    Cause: A manual recovery session was started, but an insufficient number 
           of logs were applied to make the database consistent.  This file is 
           still in the future of the last log applied.  Note that this  
           mistake can not always be caught. 
   Action: Either apply more logs until the database is consistent or 
           restore the database file from an older backup and repeat  
           recovery.

使用_ALLOW_RESETLOGS_CORRUPTION 参数需谨慎,因为该参数可能导致数据库逻辑不一致,甚至可能把本来很简单的一个恢复弄的非常复杂甚至不可恢复的后果,建议在oracle support支持下使用.另外使用该参数resetlogs库之后,强烈建议通过逻辑方式重建库

发表在 Oracle备份恢复 | 标签为 , , , , , , | 评论关闭

使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天

报ORA-00020错误,实例crash
数据库因为超过了系统的进程数,出现dbwn进程写数据文件异常

Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01148: 无法刷新数据文件 22 的文件大小
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
ORA-00020: 超出最大进程数 ()
Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
Sun Aug 25 16:00:41 CST 2013
DBW0: terminating instance due to error 1242
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 16:00:51 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by DBW0, pid = 7490

ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []

Sun Aug 25 19:19:23 CST 2013
ALTER DATABASE OPEN
Sun Aug 25 19:19:38 CST 2013
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Sun Aug 25 19:19:40 CST 2013
Started redo scan
Sun Aug 25 19:20:07 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:19 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:19 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:21 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Aborting crash recovery due to slave death, attempting serial crash recovery
Sun Aug 25 19:20:23 CST 2013
Beginning crash recovery of 1 threads
Sun Aug 25 19:20:23 CST 2013
Started redo scan
Sun Aug 25 19:20:47 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:54 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:54 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:54 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:56 CST 2013
Aborting crash recovery due to error 600
Sun Aug 25 19:20:56 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

使用隐含参数

ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;

报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复,因此这里加入隐含参数,然后尝试resetlogs,然后报如下错误

Sun Aug 25 20:11:54 CST 2013
alter database open resetlogs
Sun Aug 25 20:12:10 CST 2013
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189
Resetting resetlogs activation ID 1312390734 (0x4e397e4e)
Sun Aug 25 20:16:25 CST 2013
Setting recovery target incarnation to 2
Sun Aug 25 20:16:42 CST 2013
************************************************************
Warning: The SCN headroom for this database is only 54 days!
************************************************************
Sun Aug 25 20:16:43 CST 2013
Assigning activation ID 1352200163 (0x5098efe3)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Successful open of redo thread 1
Sun Aug 25 20:16:43 CST 2013
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 25 20:16:52 CST 2013
SMON: enabling cache recovery
Sun Aug 25 20:16:52 CST 2013
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b):
Sun Aug 25 20:16:52 CST 2013
select ctime, mtime, stime from obj$ where obj# = :1
Sun Aug 25 20:16:52 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 20:17:02 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 2859
ORA-1092 signalled during: alter database open resetlogs...

数据库当前SCN

SQL > select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
    13429649947222

SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header;

CHECKPOINT_CHANGE#
------------------
    13429649947222

解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke

sqlplus / as sysdba
startup mount

oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
oradebug poke 

recover database;
alter database open;

事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)

--故障原因
The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10].
Due to this bug, during recovery, the class designation of a data block has changed.
--处理方法 
SQL>startup mount
SQL>recover database;
SQL>alter database open;

因为MOS上给的解决思路在该数据库中已经无法尝试,不能确定该方法一定可行,但是对于本次的恢复过程中,没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS,尝试该种方法,慎重使用_allow_resetlogs_corruption参数

发表在 非常规恢复 | 标签为 , , , , , | 评论关闭

乱用_allow_resetlogs_corruption参数导致悲剧

一个朋友11.2.0.1的数据库因为断电,出现不能正常open问题,自己尝试恢复,折腾了几天,最后让我帮忙的时候错误如下

SQL> startup
ORACLE 例程已经启动。

Total System Global Area  778387456 bytes
Fixed Size                  1374808 bytes
Variable Size             545260968 bytes
Database Buffers          226492416 bytes
Redo Buffers                5259264 bytes
数据库装载完毕。
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 225)
ORA-01110: data file 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'
进程 ID: 5964
会话 ID: 1144 序列号: 5

从启动的日志提示看初步判断就是悲剧了,因为根据经验值在11gr2版本中,该错误就是undo$(分析trace文件进步一确定是undo$),该block出现异常,数据库在启动的时候要扫描该表,把相关的回滚段给online起来,现在他异常了,数据库肯定无法正常启动
dbv检查数据库文件

F:\>dbv file='F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'

DBVERIFY: Release 11.2.0.1.0 - Production on 星期三 5月 22 11:06:00 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF
页 225 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x004000e1 (file 1, block 225)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x004000e1
 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xb98e0601
 check value in block header: 0xb307
 computed block checksum: 0xe8ae



DBVERIFY - 验证完成

检查的页总数: 134400
处理的页总数 (数据): 98226
失败的页总数 (数据): 0
处理的页总数 (索引): 14189
失败的页总数 (索引): 0
处理的页总数 (其他): 4178
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 17806
标记为损坏的总页数: 1
流入的页总数: 1
加密的总页数        : 0
最高块 SCN            : 14045769 (0.14045769)

看到这里,可以确定坏块的存在,根据上面的提示,我们发现tailchk值不正确,应该是5120+06+01,而不该是b98e0601,通过bbed查看

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x004000e1
   ub4 bas_kcbh                             @8        0x00d65120
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x5ba9
   ub2 spare3_kcbh                          @18       0x0000

BBED> p tailchk
ub4 tailchk                                 @8188     0xb98e0601

进一步证明是tailchk异常导致,分析alert日志,数据库异常断电,然后启动的时候发现如下错误

Recovery of Online Redo Log: Thread 1 Group 2 Seq 431 Reading mem 0
  Mem# 0: F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\REDO02.LOG
RECOVERY OF THREAD 1 STUCK AT BLOCK 451 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc:
ORA-01172: 线程 1 的恢复停止在块 451 (在文件 3 中)
ORA-01151: 如果需要, 请使用介质恢复以恢复块和还原备份
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc:
ORA-01172: 线程 1 的恢复停止在块 451 (在文件 3 中)
ORA-01151: 如果需要, 请使用介质恢复以恢复块和还原备份
ORA-1172 signalled during: alter database open...
Tue May 21 14:27:29 2013
ALTER DATABASE RECOVER  datafile 3  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 431 Reading mem 0
  Mem# 0: F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\REDO02.LOG
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_772.trc  (incident=112164):
ORA-00600: 内部错误代码, 参数: [3020], [3], [451], [12583363], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 451, file offset is 3694592 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 3: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\UNDOTBS01.DBF'
ORA-10560: block type 'KTU UNDO BLOCK'
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 3  ...

因为file# 3, block# 451和redo信息不一致,出现ora-600[3020]错误,而file# 3为undo文件,朋友从而设置undo_management=’manual’并设置了_allow_resetlogs_corruption=true,然后进行不完全恢复,从而出现了如下错误提示

Tue May 21 14:41:23 2013
SMON: enabling cache recovery
Corrupt block relative dba: 0x004000e1 (file 1, block 225)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x004000e1
 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xb98e0601
 check value in block header: 0xb307
 computed block checksum: 0xe8ae
Reading datafile 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'
for corruption at rdba: 0x004000e1 (file 1, block 225)
Reread (file 1, block 225) found same corrupt data
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc  (incident=120165):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225)
ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225)
ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'
Errors in file f:\app\administrator\diag\rdbms\yfcloud\yfcloud\trace\yfcloud_ora_4892.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 225)
ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\YFCLOUD\SYSTEM01.DBF'
Error 604 happened during db open, shutting down database
USER (ospid: 4892): terminating the instance due to error 604

从而的原因基本上可以从操作过程中了解到:数据库是因为file# 3 block# 451和redo不一致导致问题,而恢复的操作人员冲动的使用了_allow_resetlogs_corruption参数,从而使得数据库出现了不一致性,也就是导致file# 1 block# 225坏块的根本原因,针对这样的情况,完全没有到使用_allow_resetlogs_corruption隐含参数地步

使用bbed修改tailchk

BBED> p tailchk
ub4 tailchk                                 @8188     0xb98e0601

BBED> verify
DBVERIFY - Verification starting
FILE = system01.dbf
BLOCK = 225

Block 225 is corrupt
***
Corrupt block relative dba: 0x004000e1 (file 0, block 225)
Fractured block found during verification
Data in bad block -
 type: 6 format: 2 rdba: 0x004000e1
 last change scn: 0x0000.00d65120 seq: 0x1 flg: 0x06
 consistency value in tail: 0xb98e0601
 check value in block header: 0x5ba9, computed block checksum: 0x0
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2

BBED> m /x 01062051
 File: system01.dbf (0)
 Block: 226              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 01062051

 <32 bytes per line>

BBED> p tailchk
ub4 tailchk                                 @8188     0x51200601

BBED> sum apply
Check value for File 0, Block 226:
current = 0xb307, required = 0xb307

BBED> verify
DBVERIFY - Verification starting
FILE = system01.dbf
BLOCK = 225


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

bbed修改block之后,数据库直接正常打开,完成数据库恢复任务,在这里很明显是因为错误的使用了_allow_resetlogs_corruption参数,屏蔽了redo前滚导致了相关的坏块,所以大家在数据库异常恢复的时候,需要知道各个参数的意义,而不要乱使用,很可能导致不可控结果

发表在 非常规恢复 | 一条评论