ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

发表于 2012 年 5 月 15 日由惜分飞

标题：ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

open数据库alert日志报ORA-00600[kcratr1_lostwrt]错误

Mon May 14 14:57:28 2012
ALTER DATABASE OPEN
Mon May 14 14:57:29 2012
Beginning crash recovery of 1 threads
Mon May 14 14:57:29 2012
Started redo scan
Mon May 14 14:57:29 2012
Errors in file d:\oracle\admin\cqgasold\udump\cqgasold_ora_504.trc:
ORA-00600: 内部错误代码，参数: [kcratr1_lostwrt], [], [], [], [], [], [], []

ORA-600 signalled during: alter database open...

查询相关SCN
同一个查询中SCN相同,省略

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;

未选定行

SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;

     FILE# SCN              STOP_SCN                                            
---------- ---------------- ----------------                                    
         1       7842987188                                                     
         2       7842987188                                                     
         3       7842987188                                                     

SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;

     FILE# SCN               RESETLOGS SCN                                      
---------- ----------------- -----------------                                  
         1        7842991811                 1                                  
         2        7842991811                 1                                  
         3        7842991811                 1

这里看到奇怪现象datafile scn小于datafile_header scn,数据库异常断电一般来说也不会出现这样的情况,个人猜测是错误的恢复或者使用历史控制文件导致,对于这样的现状,我先尝试着使用using backup controlfile方式恢复,结果失败.估计控制文件有异常,本着先拉起库原则,重建控制文件.

进行完全恢复

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码，参数: [3020], [8388617], [1], [23403], [25], [112],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'

尝试跳过坏块继续恢复

SQL> recover database allow 1 corruption;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码，参数: [3020], [8388610], [1], [23403], [2264], [16],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 2)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTFB Bitmapped File Space Header'

使用dbv检查坏块数量

C:\>dbv file='d:\oracle\oradata\cqgasold\undo_1.dbf' blocksize=8192

DBVERIFY: Release 9.2.0.5.0 - Production on 星期二 5月 15 19:43:42 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - 验证正在开始 : FILE = d:\oracle\oradata\cqgasold\undo_1.dbf

DBV-00200: 块, dba 8388617, 已经标记为崩溃
汇入的页369 - 可能是介质损坏
***
Corrupt block relative dba: 0x00800171 (file 2, block 369)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x00800171
 last change scn: 0x0001.d37c57db seq: 0x1 flg: 0x04
 consistency value in tail: 0x4e930260
 check value in block header: 0x8202, computed block checksum: 0x4e5f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***

汇入的页417 - 可能是介质损坏
***
Corrupt block relative dba: 0x008001a1 (file 2, block 417)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x008001a1
 last change scn: 0x0001.d37c53d4 seq: 0x2 flg: 0x04
 consistency value in tail: 0x4b6b0201
 check value in block header: 0x6ae7, computed block checksum: 0x5abc
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
…………
--类此记录很多,我放弃了跳过坏块修复的方法

恢复过程中提示坏块数据库文件离线恢复

SQL> alter database datafile 'd:\oracle\oradata\cqgasold\undo_1.dbf' offline;

数据库已更改。

SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00376: 此时无法读取文件 2
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'

到了这一步,根据经验,数据库被open的可能性很多了,很有可能是open以后因为smon回滚导致数据库down

查看日志,屏蔽回滚段,完成恢复

Tue May 15 19:59:52 2012
alter database open
Tue May 15 19:59:52 2012
Beginning crash recovery of 1 threads
Tue May 15 19:59:52 2012
Started redo scan
Tue May 15 19:59:52 2012
Completed redo scan
 323 redo blocks read, 82 data blocks need recovery
Tue May 15 19:59:52 2012
Started recovery at
 Thread 1: logseq 23404, block 3, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 23404 Reading mem 0
  Mem# 0 errs 0: F:\ORACLE\ORADATA\LOGCQGASOLD4.ORA
Tue May 15 19:59:52 2012
Completed redo application
Tue May 15 19:59:52 2012
Ended recovery at
 Thread 1: logseq 23404, block 326, scn 1.3548264979
 82 data blocks read, 82 data blocks written, 323 redo blocks read
Crash recovery completed successfully
Tue May 15 19:59:53 2012
Thread 1 advanced to log sequence 23405
Thread 1 opened at log sequence 23405
  Current log# 2 seq# 23405 mem# 0: D:\ORACLE\ORADATA\CQGASOLD\REDO02.LOG
Successful open of redo thread 1
Tue May 15 19:59:53 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 15 19:59:54 2012
Database Characterset is ZHS16GBK
Tue May 15 19:59:55 2012
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-604 signalled during: alter database open...
Tue May 15 19:59:56 2012
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 8
SMON: mark undo segment 8 as needs recovery
SMON: about to recover undo segment 9
SMON: mark undo segment 9 as needs recovery
SMON: about to recover undo segment 10
SMON: mark undo segment 10 as needs recovery
Tue May 15 20:00:37 2012
Shutting down instance (abort)

看到这里,可以大概确定是因为undo文件离线,导致回滚段异常.
这个问题,基本上可以确定通过隐含参数屏蔽回滚段,然后open数据库,重建undo删除异常undo,数据库恢复完成。

此条目发表在 Oracle备份恢复分类目录，贴了 kcratr1_lostwrt, ORA-00600 kcratr1_lostwrt, ORA-600 3020, ORA-600[3020] 标签。将固定链接加入收藏夹。

《ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复》有 4 条评论

惜分飞说：

2013 年 4 月 23 日18:32

600[3020]恢复过程alert日志记录

Mon Apr 22 21:53:41 2013
alter database recover if needed
 datafile 1055 allow 1 corruption
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Mon Apr 22 21:53:41 2013
Recovery of Online Redo Log: Thread 2 Group 8 Seq 54321 Reading mem 0
  Mem# 0: +DATA/q9dbdg/onlinelog/redo08.log
Mon Apr 22 21:53:41 2013
Recovery of Online Redo Log: Thread 1 Group 2 Seq 53587 Reading mem 0
  Mem# 0: +DATA/q9dbdg/onlinelog/redo02.log
CORRUPTING BLOCK 456338 OF FILE 1055 AND CONTINUING RECOVERY
Mon Apr 22 21:53:42 2013
Errors in file /u01/oracle/app/admin/q9db/udump/q9db1_ora_12223.trc:
ORA-10567: Redo is inconsistent with data block (file# 1055, block# 456338)
ORA-10564: tablespace PART_DATA
ORA-01110: data file 1055: '+DATA03/q9dbrac/datafile/part_data583.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 121354
Mon Apr 22 21:53:42 2013
Recovery of Online Redo Log: Thread 1 Group 3 Seq 53588 Reading mem 0
  Mem# 0: +DATA/q9dbdg/onlinelog/redo03.log
Mon Apr 22 21:53:42 2013
Media Recovery Complete (q9db1)
Completed: alter database recover if needed
 datafile 1055 allow 1 corruption
Mon Apr 22 21:53:56 2013
alter database datafile 1055 online
Mon Apr 22 21:53:56 2013
Completed: alter database datafile 1055 online

惜分飞说：

2012 年 5 月 15 日22:34

Applies to:
Oracle Server - Enterprise Edition - Version: 9.0.1.4 to 9.2.0.8
This problem can occur on any platform.

Symptoms
False ORA-600[kcratr1_lostwrt] (on 9.2) or ORA-397 (on 9.0) can happen
during crash recovery or thread recovery in RAC system.
If problem happen during crash recovery at starting up database, Oracle
can't startup database. If problem happen during thread recovery, all RAC
node might crash.

You will see following log at terminal if problem happen at crash recovery.
SQL> connect / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 72831496 bytes
Fixed Size 455176 bytes
Variable Size 62914560 bytes
Database Buffers 8388608 bytes
Redo Buffers 1073152 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr1_lostwrt], [], [], [], [],[], [], []

If problem happen at thread recovery, you will see SMON error in alert.log
Tue Oct  2 16:41:26 2007
Instance recovery: looking for dead threads
Tue Oct  2 16:41:26 2007
Beginning instance recovery of 1 threads
Tue Oct  2 16:41:26 2007
Started redo scan
Tue Oct  2 16:41:26 2007
Errors in file /app/oracle/product/9.2.0/admin/ora920/bdump/ora920_smon_12345.trc:
ORA-00600: internal error code, arguments: [kcratr1_lostwrt], [], [], [], [], [], [], []

Cause
Oracle signals this error during first pass of two pass recovery if it finds
lost write. Two pass recovery is a new feature in Oracle 9i, and gives
performance improvement on crash and thread recovery.

In first pass of two pass recovery, Oracle read redo stream and construct
list of blocks which need to be recovered. For this feature, DBWR writes
BWR (Block Written Record) to redo stream to tell it has written a change
to disk, and that change does not need to be applied during crash recovery.
At the end of first pass, Oracle compares SCN (System Change Number)
in the last BWR and SCN on the disk to verify changes in the last BWR is
actually written to disk.

But Oracle may signal false lost-write error on temporary block.
This problem can happen if someone manages temporary object including
temporary table, sort block, etc around the time database aborted.

As an example, we may encounter problem when we shutdown abort database
after using dbms_stats.gather_schema_stats (which use temporary table and
sort operation to gather statistics)

Solution

First determine if this error is signaled on temporary block or not.
Trace file written by this ORA-600 or ORA-397 has following entry at
the beginning part of trace file before showing ORA- error message.

Last BWR afn: 201 rdba: 0x400123(blk 291) ver: 0x0000.12345678.00 flg: 0x0c
Disk version: 0x0000.00000000.01 flag: 0x0c

Check values at flg: and flag: and change it to binary, if either or both have
0x08 = 00001000 bit, you hit this false error problem.
In above example, both flg: and flag: = 0x0c = 00001100, so it hit problem.

If you hit this false error, add following as initialization parameter and then
shutdown and restart database.
_two_pass = FALSE
With this parameter, Oracle doesn't use two pass recovery feature and we can
complete crash recovery.
Once database startup successfully, remove _two_pass parameter, then shutdown
normal and startup database. You should configure this parameter only when you
need to resolve this issue.

惜分飞说：

2012 年 5 月 15 日22:30

VERSIONS:
  version 6.0 and above

DESCRIPTION:        

  This is called a 'STUCK RECOVERY'.

  There is an inconsistency between the information stored in the redo 
  and the information stored in a database block being recovered.

ARGUMENTS:          

For Oracle 9.2 and earlier:
  Arg [a] Block DBA
  Arg [b] Redo Thread
  Arg [c] Redo RBA Seq
  Arg [d] Redo RBA Block No
  Arg [e] Redo RBA Offset.

For Oracle 10.1
  Arg [a] Absolute file number of the datafile.
  Arg [b] Block number
  Arg [c] Block DBA

FUNCTIONALITY:
  kernel cache recovery parallel
 
IMPACT:             
  INSTANCE FAILURE during recovery.

惜分飞说：

2012 年 5 月 15 日22:23

ORA-600 [3020] “Stuck Recovery” [ID 30866.1]

Applies to:
Oracle Server - Enterprise Edition - Version 9.2.0.1 to 11.2.0.2 [Release 9.2 to 11.2]
Information in this document applies to any platform.

Symptoms
Recovery session fails:

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [13204236], [1], [1],
[33082], [236], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 621324)
ORA-10564: tablespace DATA
ORA-01110: data file 3: '<dir>/PROD_Data01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 25535

Cause
This is stuck recovery - redo cannot be applied to a block because the scn of the block, when
read from disk is NOT the scn that we expect it to be.

Solution

If the affected file belongs to SYSTEM or UNDO tablespace you should restore  the database 
from backup and do point in time recovery to just before the problem log is applied. 

If you do not have a backup available, than the options are VERY limited. 
Please open a Service Request with Oracle Support Services for assistance on this.

If the affected file belongs to SYSAUX, this is the ONLY file affected and it is offline 
then a better option than point in time recovery of the whole database would be to create a NEW database 
and then use Transportable Tablespace feature to plug in all other tablespaces.  Another point worth noting:  
if SYSAUX is ONLINE but contains corrupt blocks, then before considering restore and recovery, use the SQL 
in point 3 below to identify the affected object  and raise a call with Oracle Support Services to ask if  
it is possible to drop and recreate the object  (unpublished Note 333665.1).

For Data Guard environments, refer to Note:1265884.1

Otherwise:

1. Use Trial Recovery to determine the extent of the problem:

SQL> recover database test;
This will tell you how many blocks (n) would be left corrupted after recovery - check the alert
log for details of the blocks affected.  If there are a large number of corruptions reported 
you may decide to restore from backup and issue point in time recovery.  However, 
if only a few blocks are reported as corrupt you could proceed with recovery :

2. Skip the corrupted block(s)

SQL> recover database allow 1 corruption;

Do this <n> times,  <n> being the number of blocks reported as corrupt in step 1 above.

This will allow recovery to continue,  'skipping' the blocks that cannot be recovered and leaving 
them marked as 'corrupt' after which the database can be opened.

3. Take the corrupt blocks reported in the alert log and for each,  
identify the object that the block belongs to:

SQL> SELECT tablespace_name, segment_type, owner, segment_name
     FROM dba_extents
     WHERE file_id = <file#>
       and <block> between block_id AND block_id + blocks - 1;

Replacing <file#> and <block> with the file# and block ids reported in the alert log.

For each object identified - take steps to resolve the corruption:

- if it belongs to an index then simply drop and recreate the index

- if it belong to a user object then consider recreating the object or extracting 
what you can from the object; if necessary raise a Service Request with Oracle and 
request assistance with extracting data from a corrupt object. 

 
Additional related documents :

Note 1265884.1 Resolving ORA-752 or ORA-600 [3020] During Standby Recovery

Note 283269.1 Stuck recovery of database ORA-00600[3020]

17813235971

QQ 咨询

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

《ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复》有 4 条评论

文章分类

最近发表

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

《ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复》有 4 条评论

标签云

文章分类

最近发表