标签归档:ORA-600 2037

110T oracle故障恢复

有客户一套110T的数据库由于存储控制器故障导致数据库无法正常启动,启动报错如下:

Wed Feb  2 23:16:13 2022
Recovery of Online Redo Log: Thread 1 Group 7 Seq 1647469 Reading mem 0
  Mem# 0: /dev/vgredo6/rredo7b
  Mem# 1: /dev/vgredo4/rredo7a
Wed Feb  2 23:16:14 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:15 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:16 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:16 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:26 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:27 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:27 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:17:11 2022
PSP0: terminating instance due to error 472
Instance terminated by PSP0, pid = 26724

该错误原因是由于redo信息和数据文件block信息不匹配导致无法正常应用日志,从而出现异常,在后续的recover 中还出现以下错误

Fri Feb 18 16:09:59 2022
ALTER DATABASE RECOVER  datafile 609,610,611,612,613,614,615,602,603,604,605,606,607,608  
Fri Feb 18 16:09:59 2022
Media Recovery Start
 parallel recovery started with 16 processes
Fri Feb 18 16:10:00 2022
Recovery of Online Redo Log: Thread 1 Group 7 Seq 1647469 Reading mem 0
  Mem# 0: /dev/vgredo6/rredo7b
  Mem# 1: /dev/vgredo4/rredo7a
Fri Feb 18 16:12:17 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_22509.trc:
ORA-00600: internal error code, arguments: [6101], [0], [42], [96], [], [], [], []
Fri Feb 18 16:18:51 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_22509.trc:
ORA-10562: Error occurred while applying redo to data block (file# 602, block# 1693691)
ORA-10564: tablespace DBS_DCDL_PT
ORA-01110: data file 602: '/dev/vgora12/rdbs_dcdl_pt0155'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 33682645
ORA-00600: internal error code, arguments: [6101], [0], [42], [96], [], [], [], []
Fri Feb 18 16:18:55 2022
Media Recovery failed with error 12801
Fri Feb 18 18:23:59 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw1_22483.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:04 2022
DBW3: terminating instance due to error 472
Fri Feb 18 18:24:04 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw3_22487.trc:
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:04 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw3_22487.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV] [] []
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:09 2022
LGWR: terminating instance due to error 472
Fri Feb 18 18:24:09 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_22489.trc:
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error

从sqlplus中看到类似一些报错

SQL>  recover datafile 601;
ORA-03113: end-of-file on communication channel

SQL> recover datafile 1066;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P015
ORA-00600: internal error code, arguments: [2037], [207064103], [207064103],
[162], [6], [1], [1833009883], [1130705717]


SQL> recover datafile 1065;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P004
ORA-00600: internal error code, arguments: [kcbzpb_1], [142189139], [3], [0],
[], [], [], []


SQL> recover datafile 2042;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P014
ORA-00600: internal error code, arguments: [3020], [627], [3234156],
[2633062764], [], [], [], []
ORA-10567: Redo is inconsistent with data block

通过屏蔽一致性,强制open库成功

Sun Feb 20 21:20:06 2022
SMON: enabling tx recovery
Sun Feb 20 21:20:06 2022
Database Characterset is ZHS16GBK
Sun Feb 20 21:20:07 2022
ORACLE Instance xifenfei (pid = 38) - Error 376 encountered while recovering transaction (74, 17) on object 34131051.
Sun Feb 20 21:20:07 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-00376: file 1416 cannot be read at this time
ORA-01110: data file 1416: '/dev/vgora14/rdbs_icdl_pt116'
Sun Feb 20 21:20:08 2022
Stopping background process MMNL
Sun Feb 20 21:20:09 2022
ORACLE Instance xifenfei (pid = 38) - Error 376 encountered while recovering transaction (88, 36) on object 33514955.
Sun Feb 20 21:20:09 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-00376: file 1264 cannot be read at this time
ORA-01110: data file 1264: '/dev/vgora14/rdbs_icdl_pt102'
Sun Feb 20 21:20:09 2022
Stopping background process MMON
Starting background process MMON
Starting background process MMNL
MMON started with pid=46, OS id=1482
Sun Feb 20 21:20:10 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767844)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:10 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:11 2022
LOGSTDBY: Validating controlfile with logical metadata
Sun Feb 20 21:20:11 2022
LOGSTDBY: Validation complete
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Completed: alter database open

对于异常undo进行处理,数据库正常open
20220220225006
20220220224004
由于客户短期无法迁移数据,先对于一些坏块进行修复,暂时运行数据库后续有时间窗口进行迁移.

发表在 非常规恢复 | 标签为 , , , , , , , , , | 评论关闭

ora-600 2037 ORA-7445 kcbs_dump_adv_state解决

有客户系统断电,导致数据库无法启动,让我们帮忙解决,通过分析主要是ORA-600 2037和ORA-7445 _kcbs_dump_adv_state等错误,通过人工recover解决.
数据库报ORA-03113,无法启动成功

C:\Documents and Settings\Administrator>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:50:36 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 1258291200 bytes
Fixed Size                  1250548 bytes
Variable Size             218106636 bytes
Database Buffers         1031798784 bytes
Redo Buffers                7135232 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

分析alert日志

Fri May 12 09:50:43 2017
ALTER DATABASE OPEN
Fri May 12 09:50:43 2017
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Fri May 12 09:50:43 2017
Started redo scan
Fri May 12 09:50:43 2017
Completed redo scan
 1240 redo blocks read, 277 data blocks need recovery
Fri May 12 09:50:44 2017
Started redo application at
 Thread 1: logseq 5881, block 41179
Fri May 12 09:50:44 2017
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5881 Reading mem 0
  Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\REDO01.LOG
Fri May 12 09:50:44 2017
Completed redo application
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []

Fri May 12 09:50:44 2017
Hex dump of (file 3, block 14004) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during crash/instance recovery
Data in bad block:
 type: 255 format: 7 rdba: 0x06010601
 last change scn: 0xa206.a2060601 seq: 0xb4 flg: 0x36
 spare1: 0x1 spare2: 0x6 spare3: 0x673
 consistency value in tail: 0x1b0a0708
 check value in block header: 0x36b4
 computed block checksum: 0xe4f5
Fri May 12 09:50:44 2017
Hex dump of (file 9, block 65507) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p003_6056.trc
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during crash/instance recovery
Data in bad block:
 type: 3 format: 6 rdba: 0x06020601
 last change scn: 0xa206.a2060602 seq: 0xe3 flg: 0xff
 spare1: 0x1 spare2: 0x6 spare3: 0x6dc
 consistency value in tail: 0xc1028001
 check value in block header: 0xffe3
 computed block checksum: 0xff01
Fri May 12 09:50:44 2017
Reread of rdba: 0x00c036b4 (file 3, block 14004) found different data
Fri May 12 09:50:44 2017
Reread of rdba: 0x0240ffe3 (file 9, block 65507) found different data
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p005_6060.trc:
ORA-00600: internal error code,arguments:[2037],[17442602],[2718302723],[255],[9],[203],[657105414],[2147549568]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc:
ORA-07445:exception encountered:core dump[ACCESS_VIOLATION][_kclcomplete+79][PC:0x72B0C7][ADDR:0x220][UNABLE_TO_READ][]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-07445: exception encountered:core dump[ACCESS_VIOLATION][_kcbzdh+2496][PC:0x4A4928][ADDR:0xB][UNABLE_TO_READ][]
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p012_6128.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [_kcbs_dump_adv_state+723] 
                                 [PC:0x5975A3] [ADDR:0xCBC0CBB2] [UNABLE_TO_READ] []
ORA-00600:internal error code,arguments:[2037],[17430318],[2718303745],[128],[1],[203],[4147028486],[2147549568]

错误比较明显由于坏块导致应用日志恢复异常,主要错误集中在ORA-600 2037,ORA-7445 _kcbs_dump_adv_state,ORA-7445_kcbzdh,ORA-7445 _kclcomplete等

dbv检查数据文件

E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:57:39 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

页 13353 标记为损坏
Corrupt block relative dba: 0x00c03429 (file 3, block 13353)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x3429a206
 last change scn: 0x066f.066f3429 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x8c96
 consistency value in tail: 0x06018001
 check value in block header: 0x0
 block checksum disabled

页 14004 标记为损坏
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x36b4a206
 last change scn: 0x0673.067336b4 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xfb97
 consistency value in tail: 0x06010210
 check value in block header: 0x0
 block checksum disabled

页 15261 标记为损坏
Corrupt block relative dba: 0x00c03b9d (file 3, block 15261)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0x3b9da206
 last change scn: 0x0673.06733b9d seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x0
 consistency value in tail: 0x06018001
 check value in block header: 0x5549
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 58880
处理的页总数 (数据): 19318
失败的页总数 (数据): 0
处理的页总数 (索引): 18610
失败的页总数 (索引): 0
处理的页总数 (其它): 13747
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 7202
标记为损坏的总页数: 3
流入的页总数: 0
最高块 SCN            : 178325323 (0.178325323)


E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:10:24 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

页 65507 标记为损坏
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0xffe3a206
 last change scn: 0x06dc.06dcffe3 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xb32
 consistency value in tail: 0x060102ff
 check value in block header: 0x0
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 1310720
处理的页总数 (数据): 34102
失败的页总数 (数据): 0
处理的页总数 (索引): 30270
失败的页总数 (索引): 0
处理的页总数 (其它): 10850
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1235497
标记为损坏的总页数: 1
流入的页总数: 0
最高块 SCN            : 178325221 (0.178325221)

确实如alert日志报错,file 3和9 都出现坏块导致实例恢复无法进行。根据错误ORA-600 2037和ORA-7445 _kcbs_dump_adv_state,初步判断和During Startup (Open Database) Alert Log Shows ORA-600[2037] and ORA-7445[kcbs_dump_adv_state] (Doc ID 551993.1)文章描述相符(而且版本也相符)

尝试recover datafile部分file

E:\>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:16:00 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> recover datafile 1;
完成介质恢复。
SQL> recover datafile 2;
完成介质恢复。
SQL> recover datafile 3;
完成介质恢复。
SQL> recover datafile 4;
完成介质恢复。
SQL> recover datafile 9;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr1_lastbwr], [], [], [], [], [], [], []

ORA-00600 kcratr1_lastbwr错误比较明显,见ORA-00600:[Kcratr1_lastbwr] During Database Startup after a Crash (Doc ID 393984.1)

通过recover database处理

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

然后通过查询dba_extents 处理坏块对象

补充ORA-600 2037错误

Format: ORA-600 [2037] [a] [b] 1 [d] [e] [f] [g]


VERSIONS:
  versions 8.0 and above

DESCRIPTION:

  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.

  The block has failed this check and this exception is raised.

ARGUMENTS:
  Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
  Arg [b] The Block format  
  Arg {c} RDBA in the block itself
  Arg [d] The block type
  Arg [e] The sequence number
  Arg [f] Flags, if set  
  Arg [g] The return value from the block head/tail checker.
发表在 ORA-xxxxx | 标签为 , , , , | 评论关闭

记录一次由于坏块和不恰当恢复引起各种ORA-600案例

朋友让我帮忙处理一个不能open的库,打开alert日志一看,傻眼了,里面是各种ORA-600的错误应有尽有,被折腾的够惨
故障后重启,无法启动主要表现在block坏块,引起的各种ORA-600等错误

Mon Mar 02 16:09:27 2015
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Completed redo application of 0.27MB
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3104 OF FILE 3
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3613 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 272 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 2512 OF FILE 3
Hex dump of (file 2, block 92889) in trace file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc
Corrupt block relative dba: 0x00816ad9 (file 2, block 92889)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 0 rdba: 0x6ad90000
 last change scn: 0x0000.00c6a052 seq: 0x1 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x5d7e
 consistency value in tail: 0xa0520001
 check value in block header: 0x0
 block checksum disabled
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p007_4196.trc  (incident=3833):
ORA-00600: internal error code, arguments: [4502], [1], [], [], [], [], [], [], [], [], [], []
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p013_4208.trc  (incident=3881):
ORA-00600: internal error code, arguments: [2037], [4259067], [4244307968], [159], [243], [0], [2162032704], [100728832], [], [], [], []
Slave exiting with ORA-1172 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p009_4200.trc:
ORA-01172: recovery of thread 1 stuck at block 3613 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p001_4184.trc:
ORA-01172: recovery of thread 1 stuck at block 2512 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc  (incident=3697):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3697/oa_dbw2_4158_i3697.trc
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0xD2DDB7, kcbs_shrink_pool()+705] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_mman_4152.trc  (incident=3673):
ORA-07445: exception encountered: core dump [kcbs_shrink_pool()+705] [SIGSEGV] [ADDR:0x0] [PC:0xD2DDB7] [SI_KERNEL(general_protection)] []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3673/oa_mman_4152_i3673.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc:
Mon Mar 02 16:09:34 2015
Instance terminated by DBW2, pid = 4158

第二次重启后增加新错误ORA-00600[17182]

Mon Mar 02 16:39:50 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p002_4321.trc  (incident=4993):
ORA-00600: internal error code, arguments: [17182], [0x7F548C2BDBA8], [], [], [], [], [], [], [], [], [], []

进行了一些恢复处理后,日志中报错
主要体现在进行了不完全恢复,而且应该是对redo进行了重命名或者redo头损坏锁引起的一系列提示

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: alter  database open...
Tue Mar 03 11:17:59 2015
Sweep [inc][17178]: completed
Sweep [inc][17177]: completed
Sweep [inc2][17178]: completed
Tue Mar 03 11:18:00 2015
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Tue Mar 03 11:18:06 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-00266: name of archived log file needed
ORA-266 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
Slave exiting with ORA-1547 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...
Tue Mar 03 11:18:06 2015
Checker run found 4 new persistent data failures
Tue Mar 03 11:18:13 2015
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 12986989
Resetting resetlogs activation ID 3278679642 (0xc36cae5a)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/oa/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

再一步折腾,增加了_allow_resetlogs_corruption= TRUE之后数据库报ORA-600[2662]

Tue Mar 03 11:19:26 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc  (incident=18195):
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_18195/oa_ora_6864_i18195.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6864): terminating the instance due to error 704
Instance terminated by USER, pid = 6864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6864) as a result of ORA-1092
Tue Mar 03 11:19:29 2015
ORA-1092 : opitsk aborting process

进一步折腾,可以看出来undo已经被其offline,无法正常访问,导致系统报ORA-704和ORA-00376

Wed Mar 04 21:10:58 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17074): terminating the instance due to error 704
Instance terminated by USER, pid = 17074
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17074) as a result of ORA-1092
Wed Mar 04 21:11:00 2015
ORA-1092 : opitsk aborting process

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测结果见附件(xifenfei_db_recover_20150304),这里可以知道undo 不知道怎么折腾的数据文件scn较大而且还offline,
通过一些列方法(bbed,隐含参数等)调整数据库scn,强制启动数据库,报如下错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,该错误怀疑是文件头scn不够大,块延迟清理导致,进一步增加scn尝试,最后依旧是ORA-00704/ORA-00604/ORA-01555错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,在scn上做手脚估计难以解决给问题,对其启动过程做10046和errorstack分析发现

PARSING IN CURSOR #3 len=202 dep=2 uid=0 oct=3 lid=0 tim=1425481940448439 hv=3819099649 ad='64ff91af8' sqlid='3nkd3g3ju5ph1'
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
END OF STMT
PARSE #3:c=1000,e=334,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1425481940448439
BINDS #3:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a6f0  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7f5b3253a6b8  bln=32  avl=06  flg=05
  value="PROPS$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a688  bln=24  avl=02  flg=05
  value=1
EXEC #3:c=0,e=640,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=2853959010,tim=1425481940449147
WAIT #3: nam='db file sequential read' ela= 5 file#=1 block#=345 blocks=1 obj#=37 tim=1425481940449186
WAIT #3: nam='db file sequential read' ela= 4 file#=1 block#=44528 blocks=1 obj#=37 tim=1425481940449221
WAIT #3: nam='db file sequential read' ela= 3 file#=1 block#=5505 blocks=1 obj#=37 tim=1425481940449247

*** 2015-03-04 23:12:20.450
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
----- Current SQL Statement for this session (sql_id=g64r07v2jn8nq) -----
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'

这里可以发现是数据库在启动的过程中需要执行SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,而该语句递归调用了select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null 语句。既然这样通过一些方法避免数据库启动之时查询SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,果然数据库启动成功。

知识点补充
ORA-600 [4502] [a]

Arg [a] ITL entry with a lock count
Meaning: During ITL cleanout we clear all row locks but the ITL entry
	 still thinks there is an uncleared lock. Ie: ITL has a locked
	 row but there are no locked rows in the block 

大体意思是数据库发现undo 的itl已经被清除,但是block中的itl依然存在,从而出现ORA-600[4502],引起该问题除bug外主要原因是坏块

ORA-600 [2037] [a] [b] {c} [d] [e] [f] [g]

Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
Arg [b] The Block format  
Arg {c} RDBA in the block itself
Arg [d] The block type
Arg [e] The sequence number
Arg [f] Flags, if set  
Arg [g] The return value from the block head/tail checker.
DESCRIPTION:
  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.
  The block has failed this check and this exception is raised

大体意思是在恢复过程中,正在检查的块,以确保它在应用任何变化向量之前不损坏。如果检查失败排除该异常ORA-600[2037],引起该问题除bug外主要原因是坏块

ORA-600 [kcbzpbuf_1],[a],[b]

Arg [a] Corruption reason
Arg [b] Calculate checksum flag

Corruption reason:
#define KCBH_GOOD    0                                     /* block is valid */
#define KCBH_ZERO    1             /* block header was entirely zero on disk */
#define KCBH_BROKEN  2      /* corruption could be from a partial disk write */
#define KCBH_CHKVAL  3               /* The check value for the block failed */
#define KCBH_CORRUPT 4     /* this is the wrong block or is not a data block */
#define KCBH_ZERONG  5               /* all zero block and it is not allowed */

Calculate checksum flag:
The possible values are 1 (Generate Checksum - db_block_checksum is enabled - default value) 
                        0 (do not generate checksum - db_block_checksum=false)

kcbzpbuf_1是该错误的源码函数

ORA-600 [17182] [a] [b] {c} [d] [e]

DESCRIPTION:        
  Oracle has detected that the magic number in a memory chunk header has been overwritten.
  This is a heap (in memory) corruption and there is no underlying data corruption.
  The error may occur in the one of the process specific heaps 
  (the Call heap, PGA heap, or session heap) or in the shared heap (SGA).

ORACLE 发现在内存中重要的块头被重新,但是没有基础数据损坏,大部分和数据块或者内存损坏有关系.

ORA-600 [4552] [a] [b] {c} [d] [e]

DESCRIPTION:
  This assertion is raised because we are trying to unlock the rows in a 
  block, but receive an incorrect block type.
  The second argument is the block type received.

ORACLE尝试对某行进行解锁但是接收到了不正确的数据块类型,Arg [b]是接收到的数据块类型

ORA-600 [2662] [a] [b] {c} [d] [e]

DESCRIPTION:

  A data block SCN is ahead of the current SCN.

  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN 
  stored in a UGA variable.

  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.

ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE 
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

主要的含义就是oracle文件头scn比某个block dependent scn小从而出现该问题

发表在 Oracle备份恢复 | 标签为 , , , , , , , , | 评论关闭