标签归档:_ALLOW_RESETLOGS_CORRUPTION

使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天

报ORA-00020错误,实例crash
数据库因为超过了系统的进程数,出现dbwn进程写数据文件异常

Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01148: 无法刷新数据文件 22 的文件大小
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
ORA-00020: 超出最大进程数 ()
Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
Sun Aug 25 16:00:41 CST 2013
DBW0: terminating instance due to error 1242
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 16:00:51 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by DBW0, pid = 7490

ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []

Sun Aug 25 19:19:23 CST 2013
ALTER DATABASE OPEN
Sun Aug 25 19:19:38 CST 2013
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Sun Aug 25 19:19:40 CST 2013
Started redo scan
Sun Aug 25 19:20:07 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:19 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:19 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:21 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Aborting crash recovery due to slave death, attempting serial crash recovery
Sun Aug 25 19:20:23 CST 2013
Beginning crash recovery of 1 threads
Sun Aug 25 19:20:23 CST 2013
Started redo scan
Sun Aug 25 19:20:47 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:54 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:54 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:54 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:56 CST 2013
Aborting crash recovery due to error 600
Sun Aug 25 19:20:56 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

使用隐含参数

ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;

报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复,因此这里加入隐含参数,然后尝试resetlogs,然后报如下错误

Sun Aug 25 20:11:54 CST 2013
alter database open resetlogs
Sun Aug 25 20:12:10 CST 2013
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189
Resetting resetlogs activation ID 1312390734 (0x4e397e4e)
Sun Aug 25 20:16:25 CST 2013
Setting recovery target incarnation to 2
Sun Aug 25 20:16:42 CST 2013
************************************************************
Warning: The SCN headroom for this database is only 54 days!
************************************************************
Sun Aug 25 20:16:43 CST 2013
Assigning activation ID 1352200163 (0x5098efe3)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Successful open of redo thread 1
Sun Aug 25 20:16:43 CST 2013
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 25 20:16:52 CST 2013
SMON: enabling cache recovery
Sun Aug 25 20:16:52 CST 2013
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b):
Sun Aug 25 20:16:52 CST 2013
select ctime, mtime, stime from obj$ where obj# = :1
Sun Aug 25 20:16:52 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 20:17:02 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 2859
ORA-1092 signalled during: alter database open resetlogs...

数据库当前SCN

SQL > select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
    13429649947222

SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header;

CHECKPOINT_CHANGE#
------------------
    13429649947222

解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke

sqlplus / as sysdba
startup mount

oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
oradebug poke 

recover database;
alter database open;

事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)

--故障原因
The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10].
Due to this bug, during recovery, the class designation of a data block has changed.
--处理方法 
SQL>startup mount
SQL>recover database;
SQL>alter database open;

因为MOS上给的解决思路在该数据库中已经无法尝试,不能确定该方法一定可行,但是对于本次的恢复过程中,没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS,尝试该种方法,慎重使用_allow_resetlogs_corruption参数

发表在 非常规恢复 | 标签为 , , , , , | 评论关闭

_allow_resetlogs_corruption和adjust_scn解决ORA-01190

一、模拟offline文件然后resetlogs操作

1.设置datafile 5数据文件offline
2.rman备份数据库
3.关闭原数据库,删除数据文件/当前日志和部分归档日志
4.执行不完全恢复,resetlogs打开数据库(如下面操作)
[oracle@xifenfei ora11g]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 15 07:36:59 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> recover database until cancel;
ORA-00279: change 868870 generated at 03/15/2012 03:32:11 needed for thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/archivelog/ora11g/1_29_777766629.dbf
ORA-00280: change 868870 for thread 1 is in sequence #29


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open


SQL> alter database open resetlogs;

Database altered.

SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

     FILE# ONLINE_STATUS  TO_CHAR(CHANGE#,'999999999
---------- -------------- --------------------------
         5 OFFLINE               868810

SQL> alter database datafile 5 online;
alter database datafile 5 online
*
ERROR at line 1:
ORA-01190: control file or data file 5 is from before the last RESETLOGS
ORA-01110: data file 5: '/u01/oracle/oradata/ora11g/xifenfei01.dbf'


SQL> select file#,to_char(checkpoint_change#,'999999999999'),
  2  to_char(last_change#,'999999999999') from v$datafile;

     FILE# TO_CHAR(CHECKPOINT_CHANGE# TO_CHAR(LAST_CHANGE#,'9999
---------- -------------------------- --------------------------
         1        868874
         2        868874
         3        868874
         4        868874
         5        868810                     868874

--可以看到offline的数据文件,没有因为resetlogs操作而改变
--CHECKPOINT_CHANGE#和RESETLOGS_CHANGE#信息
SQL> select file#,to_char(checkpoint_change#,'999999999999'),
  2  to_char(RESETLOGS_CHANGE#,'999999999999')
  3  from v$datafile_header;

     FILE# TO_CHAR(CHECKPOINT_CHANGE# TO_CHAR(RESETLOGS_CHANGE#,
---------- -------------------------- --------------------------
         1        868874                     868871
         2        868874                     868871
         3        868874                     868871
         4        868874                     868871
         5        868810                     787897

二、隐含参数设置

SQL> create pfile='/tmp/pfile' from spfile;

File created.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

在pfile中增加
_allow_resetlogs_corruption=true
_allow_error_simulation=TRUE(10g及其以上版本需要)

三、打开数据库,online离线文件

SQL> startup pfile='/tmp/pfile' mount;
ORACLE instance started.

Total System Global Area  368263168 bytes
Fixed Size                  1345016 bytes
Variable Size             293603848 bytes
Database Buffers           67108864 bytes
Redo Buffers                6205440 bytes
Database mounted.
--在mount状态下执行
SQL> alter session set events '10015 trace name adjust_scn level 2';

Session altered.


--[一定要]在mount状态下执行online操作
SQL> alter database datafile 5 online;

Database altered.
SQL> recover database until cancel;
ORA-00279: change 868810 generated at 03/13/2012 22:19:37 needed for thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/archivelog/ora11g/1_27_777766629.dbf
ORA-00280: change 868810 for thread 1 is in sequence #27


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01190: control file or data file 1 is from before the last RESETLOGS
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;

Database altered.

SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

no rows selected

姊妹篇:bbed解决ORA-01190

发表在 Oracle备份恢复 | 标签为 , , | 评论关闭

使用_allow_resetlogs_corruption打开无归档日志rman备份库

rman还原恢复操作

--还原数据库
RMAN> restore database;
--恢复数据库
RMAN> recover database;

Starting recover at 2012-03-08 21:20:45
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=65 device type=DISK

starting media recovery

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/08/2012 21:20:47
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 2936 and starting SCN of 25991695 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 2935 and starting SCN of 25991652 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 2934 and starting SCN of 25991649 found to restore
……………………
RMAN-06025: no backup of archived log for thread 1 with sequence 2902 and starting SCN of 25991156 found to restore
这里报日志缺少,实际上是备份的数据库文件后,没有备份归档日志,归档日志全部丢失

进行不完全恢复

SQL> recover database until cancel;
ORA-00279: change 25991194 generated at 03/08/2012 20:33:58 needed for thread 1
ORA-00289: suggestion : /opt/oracle/oradata/archivelog/chf/1_2902_752334071.dbf
ORA-00280: change 25991194 for thread 1 is in sequence #2902


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

查看相关SCN

SQL> select file#,to_char(checkpoint_change#,'999999999999') from v$datafile;

     FILE# TO_CHAR(CHECK
---------- -------------
         1      25992214
         2      25992214
         3      25992214
         4      25992214
         5      25992214
         6      25992214
         7      25992214
         8      25992214
         9      25992214
        10      25992214
        11      25992214

     FILE# TO_CHAR(CHECK
---------- -------------
        13      25992214
        14      25992214

13 rows selected.

SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

     FILE# ONLINE_ TO_CHAR(CHANG
---------- ------- -------------
         1 ONLINE       25991194
         2 ONLINE       25991194
         3 ONLINE       25991194
         4 ONLINE       25991194
         5 ONLINE       25991194
         6 ONLINE       25991194
         7 ONLINE       25991194
         8 ONLINE       25991194
         9 ONLINE       25991194
        10 ONLINE       25991194
        11 ONLINE       25991194

     FILE# ONLINE_ TO_CHAR(CHANG
---------- ------- -------------
        13 ONLINE       25991194
        14 ONLINE       25991194

13 rows selected.

SQL> select file#,to_char(checkpoint_change#,'999999999999') from v$datafile_header;

     FILE# TO_CHAR(CHECK
---------- -------------
         1      25991194
         2      25991194
         3      25991194
         4      25991194
         5      25991194
         6      25991194
         7      25991194
         8      25991194
         9      25991194
        10      25991194
        11      25991194

     FILE# TO_CHAR(CHECK
---------- -------------
        13      25991194
        14      25991194

13 rows selected.

--发现数据文件scn和控制文件不一致,重建控制文件,然后查询相关scn
SQL> select file#,to_char(checkpoint_change#,'999999999999') from v$datafile;

     FILE# TO_CHAR(CHECK
---------- -------------
         1      25991194
         2      25991194
         3      25991194
         4      25991194
         5      25991194
         6      25991194
         7      25991194
         8      25991194
         9      25991194
        10      25991194
        11      25991194

     FILE# TO_CHAR(CHECK
---------- -------------
        13      25991194
        14      25991194

13 rows selected.

SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

     FILE# ONLINE_ TO_CHAR(CHANG
---------- ------- -------------
         1 ONLINE       25991194
         2 ONLINE       25991194
         3 ONLINE       25991194
         4 ONLINE       25991194
         5 ONLINE       25991194
         6 ONLINE       25991194
         7 ONLINE       25991194
         8 ONLINE       25991194
         9 ONLINE       25991194
        10 ONLINE       25991194
        11 ONLINE       25991194

     FILE# ONLINE_ TO_CHAR(CHANG
---------- ------- -------------
        13 ONLINE       25991194
        14 ONLINE       25991194

13 rows selected.

SQL> select file#,to_char(checkpoint_change#,'999999999999') from v$datafile_header;

     FILE# TO_CHAR(CHECK
---------- -------------
         1      25991194
         2      25991194
         3      25991194
         4      25991194
         5      25991194
         6      25991194
         7      25991194
         8      25991194
         9      25991194
        10      25991194
        11      25991194

     FILE# TO_CHAR(CHECK
---------- -------------
        13      25991194
        14      25991194

13 rows selected.
--此时所有scn均一致

尝试打开数据库

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'


SQL> recover database using backup controlfile until cancel;
ORA-00279: change 25991194 generated at 03/08/2012 20:33:58 needed for thread 1
ORA-00289: suggestion : /opt/oracle/oradata/archivelog/chf/1_2902_752334071.dbf
ORA-00280: change 25991194 for thread 1 is in sequence #2902


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'


ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

使用隐含参数打开数据库

SQL> create pfile='/tmp/pfile' from spfile;

File created.

-------/tmp/pfile中加上----------
_allow_resetlogs_corruption= TRUE
---------------------------------

SQL> startup mount pfile='/tmp/pfile' force
ORACLE instance started.

Total System Global Area  622149632 bytes
Fixed Size                  2230912 bytes
Variable Size             419431808 bytes
Database Buffers          192937984 bytes
Redo Buffers                7548928 bytes
Database mounted.
SQL> alter database open resetlogs;

Database altered.

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

总结
这次的试验没有多少实际意义,但是可以说明几个问题:
1.所有的数据文件的scn都一致,甚至和控制文件的也一致,数据库不一定可以open成功
(怀疑是数据文件中的scn大于data header scn)
2.对于这样的问题,如果使用bbed修改所有数据文件header的scn不知道是否可以解决
3.如果rman只备份了数据文件而没有任何一个归档日志,数据库通过隐含参数还是可以open,抢救数据

发表在 rman备份/恢复 | 标签为 | 一条评论