-
3D Cloud
asm mount asm恢复 bbed bootstrap$ dmp损坏 dul In Memory kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01190 ORA-01555 ORA-01578 ORA-08103 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kccpb_sanity_check_2 ORA-15042 ORACLE 12C oracle dul ORACLE PATCH oracle异常恢复 ORACLE恢复 Oracle 恢复 ORACLE数据库恢复 oracle 比特币 redo异常 undo异常 YOUR FILES ARE ENCRYPTED _ALLOW_RESETLOGS_CORRUPTION 勒索恢复 数据库恢复 比特币 比特币 oracle 比特币加密 比特币勒索
文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (87)
- 数据库 (1,312)
- DB2 (22)
- MySQL (56)
- Oracle (1,198)
- Data Guard (39)
- EXADATA (7)
- GoldenGate (19)
- ORA-xxxxx (145)
- ORACLE 12C (71)
- ORACLE 18C (6)
- ORACLE 19C (8)
- Oracle ASM (56)
- Oracle Bug (7)
- Oracle RAC (40)
- Oracle 安全 (6)
- Oracle 开发 (25)
- Oracle 监听 (26)
- Oracle备份恢复 (392)
- Oracle安装升级 (54)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (41)
- PostgreSQL (10)
- SQL Server (26)
- SQL Server恢复 (7)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (23)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (6)
-
最近发表
- incaseformat 病毒删除文件恢复
- xfs文件系统mysql删库恢复
- 对恢复案例:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年事件有感
- Oracle Recovery Tools 解决ORA-01190 ORA-01248等故障
- Oracle Recovery Tools 12月份更新
- -bash: /bin/rm: Argument list too long
- ORA-27303: failure occurred at: skgpwinit6
- ORA-600 kffmLoad_1 kffmVerify_4
- ORA-00600 kfrHtAdd01
- ORA-00600 [2662]和ORA-00600 [4194]恢复
- 记录oracle安装的两个小问题(INS-30060和弹出子窗口异常)
- dblink会话引起library cache lock
- asm磁盘类似_DROPPED_0001_DATA名称故障处理
- 12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
- 再次遇到ORA-600 kokasgi1故障恢复
- sysaux表空间不足—WRH$_ACTIVE_SESSION_HISTORY
- sql server 删除数据库恢复
- .eight加密数据库恢复
- xiaolinghelper@firemail.cc加密数据库恢复
- 加密.CC4H扩展名数据库恢复支持
友情链接
标签归档:ORACLE恢复
How to resolve ORA-600 [4194] errors
在oracle恢复中ORA-600 4194是一个非常常见的错误,该错误的主要原因是由于redo记录和undo(rollback)记录不匹配.
ORA 600 4194错误原因以及含义
ERROR: Format: ORA-600 [4194] [a] [b] VERSIONS: versions 6.0 to 12.1 DESCRIPTION: A mismatch has been detected between Redo records and rollback (Undo) records. We are validating the Undo record number relating to the change being applied against the maximum undo record number recorded in the undo block. This error is reported when the validation fails. ARGUMENTS: Arg [a] Maximum Undo record number in Undo block Arg [b] Undo record number from Redo block
ORA 600 4194 错误处理思路
第一步
Confirm whether the database is up and running or not. If the database fails to start or crashes shortly after startup due to this error occurring, then try setting event 10513 at level 2 in the init.ora/spfile to disable transaction recovery and restart the instance, e.g.: event = "10513 trace name context forever, level 2" This may allow the database to successfully open and stay up so that the required diagnostics/actions can be performed.
第二步
In the trace file there should be an undo segment header dump, and so check to see if the undo segment header shows an active transaction after recovery, e.g.: TRN TBL <---- Represents the Transaction table for the particular undo segment index state cflags wrap# uel scn dba --------------------------------------------------------------------------------------------- 0x41 9 0x80 0x35ab6 0xffff 0x0695.38f6b959 0x1081e796 0x42 9 0x80 0x35bb1 0x000e 0x0695.38f6b028 0x1081e793 0x43 9 0x80 0x35b11 0x005d 0x0695.38f6b7ae 0x1081e795 0x44 9 0x80 0x359f0 0x0036 0x0695.38f69a91 0x1081e78e 0x45 10 0x80 0x35b1b 0x0000 0x0695.3a0aba4d 0x1081e796 0x46 9 0x80 0x35bb7 0x001c 0x0695.38f69bde 0x1081e78f =================================== State ---> This column specifies the status of the transaction 9 -----> represents a commited transaction 10 ---> Represents a active transaction Dba -----> Undo block containing the undo records Strictly speaking this is the block at the end of the undo chain. You can see from the transaction table that there is an active transaction for this particular rollback/undo segment after recovery. Therefore this rollback/undo segment and/or undo tablespace cannot be dropped without corrupting the database! Therefore recreating the UNDO tablespace is not an option.
第三步
From the trace file determine the affected undo segment, e.g.: Block image after block recovery: UNDO BLK: xid: 0x0015.02b.0001544b seq: 0x163e cnt: 0x12 irb: 0x12 icl: 0x0 flg: 0x0000 XID ==> Undo segment no + Slot no + Sequence no Therefore, in this case the Undo Segment is: USN# 0x15 (Hex) ==> 21 (Dec) ==> _SYSSMU21$ So if and ONLY IF the transaction table shows no active transaction can the rollback/undo segment be offlined and dropped.Note however, that before you can confirm if the entire UNDO tablespace can be dropped, you would need to check the transaction tables of ALL active rollback/undo segments in the same wasy as the above. The steps required to drop the rollback/undo segment are fully detailed in Note:179952.1, but are briefly listed here for completeness: If using Automatic Undo Management Offline the undo segment using the _OFFLINE_ROLLBACK_SEGMENTS parameter and bounce the database as follows: 1. Create and edit the init.ora file for the instance to set the following parameters: UNDO_MANAGEMENT=MANUAL _OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU21$) 2. Open the database in restricted mode to prevent user access, e.g.: connect / as sysdba startup restrict pfile = '<Full path to init.ora file>'; 3. Drop the rollback/undo segment, e.g.: drop rollback segment "_SYSSMU21"; 4. Shutdown the instance, and remove the init.ora parameters added in point 1 and restart the instance, e.g.: shutdown immediate startup If SMON was recovering the transaction then this may not work as we cannot open the database if it is determined to be in an inconsistent state. I have reviewed a number of SRs where this approach was successful, so it is important to try it first but understand that it may fail and you will have to resort to a point in time recovery or forcing open the DB and recreating it.
第四步
Now we need to dump the undo block to see which object was affected. We noted in Step 2 that this is the active transaction (from the trace file): TRN TBL index state cflags wrap# uel scn dba 0x45 10 0x80 0x35b1b 0x0000 0x0695.3a0aba4d 0x1081e796 Dba----------------> Undo block containing the undo records dba--->0x1081e796 is the block containing the active transaction . Use the WebIV tools to convert this RDBA to block number (block#) and file number (file#), e.g.: V SPLIT ==> DBA (Hex) = File#,Block# (Hex File#,Block#) = ===== === ===== ============ V8 10,10 ==> 276948886 (0x1081e796) = 66,124822 (0x42 0x1e796) So the file# is 66 and the block# is 124822, so dump the block by issuing: SQL> Alter system dump datafile 66 block 124822; This will generate a trace file in the user_dump_dest. The following is a sample of the information in the undo block: UNDO BLK: xid: 0x000c.045.00035b1b seq: 0x1e14 cnt: 0x17 irb: 0x17 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f8c 0x02 0x1f30 0x03 0x1ed4 0x04 0x1e78 0x05 0x1e1c 0x06 0x1dc0 0x07 0x1d64 0x08 0x1d08 0x09 0x1cac 0x0a 0x1c50 0x0b 0x1bf4 0x0c 0x1b98 0x0d 0x1b3c 0x0e 0x1ae0 0x0f 0x1a74 0x10 0x1a18 0x11 0x19bc 0x12 0x1960 0x13 0x1904 0x14 0x187c 0x15 0x181c 0x16 0x1798 0x17 0x173c * Rec #0x16 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- uba: 0x1081e796.1e14.14 ctl max scn: 0x0695.38f69853 prv tx scn: 0x0695.38f698a1 KDO undo record: KTB Redo op: 0x04 ver: 0x01 op: L itl: scn: 0x0019.009.00034237 uba: 0x36c0cce4.1d2f.19 flg: C--- lkc: 0 scn: 0x0695.38f6b96b KDO Op code: URP xtype: XA bdba: 0x35406893 hdba: 0x35406892 itli: 1 ispac: 0 maxfr: 4863 tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0 ncol: 1 nnew: 1 size: -1 col 0: [ 4] c3 0e 36 2e *----------------------------- * Rec #0x17 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) * Layer: 11 (Row) opc: 1 rci 0x16 Undo type: Regular undo Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- From the trace file above: UNDO BLK: xid: 0x000c.045.00035b1b seq: 0x1e14 cnt: 0x17 irb: 0x17 icl: 0x0 flg: 0x0000 The undo segment with the active transaction is segment is 0x000c (Hex) which is 12 (Dec) as the XID is: Undo segment no + Slot no + Sequence no This step is often skipped because it was performed earlier in step 3, however it is a good idea to do this again now to make sure that the XID from the UNDO block matches the UNDO SEGMENT HEADER, this way you have followed all the chain, from the UNDO SEGMENT to UNDO BLOCK, back and forth. If there is a conflict here please check and make sure that the customer dumped the correct undo block. Check for the value of irb which is an index which points you to the latest change done to the undo block. This is the point from which a rollback would begin if one was issued. From the trace file we see: 'irb: 0x17' so this points to record 0x17, so search for this particular string i.e 0x17 and it will take you to undo record 'REC #0x17', e.g.: * Rec #0x17 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) * Layer: 11 (Row) opc: 1 rci 0x16 Undo type: Regular undo Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- Note the slot number (slt) is 0x45, the object number (objn) is the OBJECT_ID from dba_objects and data object number (objd) is the DATA_OBJECT_ID from dba_objects. These numbers may be the same but not necessarily, and so if the database is open then identify this object, e.g.: select object_name, owner, object_type, data_object_id from dba_objects where object_id = <objn>; This is the object, which has an active transaction. Note in the above trace file extract that rci has a value of 0x16 which means that this record is at the end of an undo chain. This means that the chain continues in another UNDO BLOCK. Please refer to unpublished Note:281504.1 for information on Undo chains. So the next record that needs to be rolled back is present in REC #X016. If rci is 0x00 then it means that this is the first record present in the undo chain and so you can check to see if there is rdba info, e.g.: * Rec #0x16 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- uba: 0x1081e796.1e14.14 ctl max scn: 0x0695.38f69853 prv tx scn: 0x0695.38f698a1 KDO undo record: KTB Redo op: 0x04 ver: 0x01 op: L itl: scn: 0x0019.009.00034237 uba: 0x36c0cce4.1d2f.19 flg: C--- lkc: 0 scn: 0x0695.38f6b96b KDO Op code: URP xtype: XA bdba: 0x35406893 hdba: 0x35406892 itli: 1 ispac: 0 maxfr: 4863 tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0 ncol: 1 nnew: 1 size: -1 col 0: [ 4] c3 0e 36 2e *----------------------------- If the object is an Index, drop and recreate it. If it is a table, then again the table would need to be dropped and recreated (or truncated) so that its object number changes and hence the rollback/undo is no longer required. If this isn't possible, then you have two options: First take a backup of the database in its current state. This is critical in case anything goes wrong and you lose the opportunity to salvage the data! Option 1 - Restore the undo segment datafile and the datafile containing the object and perform a full recovery. This can only be done if you have all the archived redo as you will need to do full recovery on these files. OR Option 2 If option 1 is not possible, you can use the unsupported method, e.g.: Specify the undo segment in the _OFFLINE_ROLLBACK_SEGMENTS parameter and try to drop the rollback segment. If there is an active transaction then this is not likely to work and you will probably need to set the _CORRUPTED_ROLLBACK_SEGMENTS parameter as well
温馨提示:
1.隐含参数_OFFLINE_ROLLBACK_SEGMENTS/_CORRUPTED_ROLLBACK_SEGMENTS属于Oracle内部隐含参数,建议在Oracle support认可的情况下使用,因为使用之后可能导致数据库事务完整性彻底损坏
2.进行屏蔽事务之前,如果条件允许建议使用txchecker检查
2.使用上述方法恢复数据库之后,建议通过逻辑方式导出导入重建数据库
aix平台 ORA-01115 ORA-01110 ORA-27067 故障恢复
接到朋友恢复请求,aix 5.3,Oracle 10.2.0.1平台,数据库启动报ORA-01115 ORA-01110 ORA-27067错误,数据库无法正常打开,通过分析,是由于10201在aix上面的bug导致,通过技巧规避,完美解决给问题,数据0丢失
数据库报错alert日志
Mon Aug 10 13:25:22 2015 ALTER DATABASE MOUNT Mon Aug 10 13:25:29 2015 Setting recovery target incarnation to 1 Mon Aug 10 13:25:29 2015 Successful mount of redo thread 1, with mount id 432339141 Mon Aug 10 13:25:29 2015 Database mounted in Exclusive Mode Completed: ALTER DATABASE MOUNT Mon Aug 10 13:25:36 2015 alter database open Mon Aug 10 13:25:36 2015 Beginning crash recovery of 1 threads parallel recovery started with 15 processes Mon Aug 10 13:25:37 2015 Started redo scan Mon Aug 10 13:25:52 2015 Completed redo scan 7889582 redo blocks read, 75305 data blocks need recovery Mon Aug 10 13:25:53 2015 Errors in file /dc/admin/datacent/bdump/datacent_p002_144124.trc: ORA-01115: IO error reading block from file 2 (block # 40704) ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf' ORA-27067: size of I/O buffer is invalid Additional information: 2 Additional information: 1572864 Mon Aug 10 13:25:53 2015 Aborting crash recovery due to slave death, attempting serial crash recovery Mon Aug 10 13:25:53 2015 Beginning crash recovery of 1 threads Mon Aug 10 13:25:53 2015 Started redo scan Mon Aug 10 13:26:09 2015 Completed redo scan 7889582 redo blocks read, 75305 data blocks need recovery Mon Aug 10 13:26:12 2015 Aborting crash recovery due to error 1115 Mon Aug 10 13:26:12 2015 Errors in file /dc/admin/datacent/udump/datacent_ora_123384.trc: ORA-01115: IO error reading block from file 2 (block # 39077) ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf' ORA-27067: size of I/O buffer is invalid Additional information: 2 Additional information: 1310720 ORA-1115 signalled during: alter database open...
这里报的前面两个错误ORA-01115 ORA-01110我们都非常熟悉,类似数据库启动遇到坏块或者io错误之时可能就会报如此错误。但是ORA-27067确实不多见,从mos上看,很多是由于rman备份之时的bug可能导致该错误。
dbv检测undo坏块文件
DBVERIFY: Release 10.2.0.1.0 - Production on Mon Aug 10 23:18:15 2015 Copyright (c) 1982, 2003, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /dc/oradata/datacent/undotbs01.dbf DBVERIFY - Verification complete Total Pages Examined : 329600 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 327504 Total Pages Processed (Seg) : 17 Total Pages Failing (Seg) : 0 Total Pages Empty : 2096 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 1887888 (0.1887888)
这里可以看到,undo文件本身并没有逻辑和物理的坏块,证明因为数据库异常的原因,可能是由于ORA-27067: size of I/O buffer is invalid导致。根据官方文档ORA-01115 ORA-27067 DURING PARALLEL INSTANCE RECOVERY AFTER INSTANCE CRASH中的解释,我们基本上可以确定很可能是由于10.2.0.1在aix平台的jfs2系统中,由于大量事务操作,突然abort掉数据库(也可能断电),从而数据库在启动的时候进行实例恢复,而由于内部的bug,导致实例恢复无法成功。通过我们处理后的,数据库完美启动,数据0丢失
数据库启动日志
Mon Aug 10 16:34:14 2015 alter database open Mon Aug 10 16:34:14 2015 Beginning crash recovery of 1 threads parallel recovery started with 15 processes Mon Aug 10 16:34:14 2015 Started redo scan Mon Aug 10 16:34:27 2015 Completed redo scan 7889582 redo blocks read, 0 data blocks need recovery Mon Aug 10 16:34:27 2015 Started redo application at Thread 1: logseq 664704, block 1286922 Mon Aug 10 16:34:27 2015 Recovery of Online Redo Log: Thread 1 Group 4 Seq 664704 Reading mem 0 Mem# 0 errs 0: /dev/rredo04 Mon Aug 10 16:34:32 2015 Recovery of Online Redo Log: Thread 1 Group 5 Seq 664705 Reading mem 0 Mem# 0 errs 0: /dev/rredo05 Mon Aug 10 16:34:38 2015 Recovery of Online Redo Log: Thread 1 Group 6 Seq 664706 Reading mem 0 Mem# 0 errs 0: /dev/rredo06 Mon Aug 10 16:34:40 2015 Completed redo application Mon Aug 10 16:34:40 2015 Completed crash recovery at Thread 1: logseq 664706, block 1017805, scn 8554793334 0 data blocks read, 0 data blocks written, 7889582 redo blocks read Mon Aug 10 16:34:40 2015 Thread 1 advanced to log sequence 664707 Thread 1 opened at log sequence 664707 Current log# 1 seq# 664707 mem# 0: /dev/rredo01 Successful open of redo thread 1 Mon Aug 10 16:34:40 2015 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Mon Aug 10 16:34:40 2015 SMON: enabling cache recovery Mon Aug 10 16:34:40 2015 Successfully onlined Undo Tablespace 1. Mon Aug 10 16:34:40 2015 SMON: enabling tx recovery Mon Aug 10 16:34:41 2015 Database Characterset is ZHS32GB18030 replication_dependency_tracking turned off (no async multimaster replication found) WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected. Mon Aug 10 16:34:41 2015 SMON: Parallel transaction recovery tried Mon Aug 10 16:34:42 2015 db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Mon Aug 10 16:34:42 2015 Completed: alter database open
分享一例由于主库逻辑坏块导致dataguard容灾失效
有朋友数据库配置了物理dataguard,但是由于触发了某种Oracle bug/或者其他原因导致block出现大量逻辑坏块,结果是主备库(主库启动后就crash,备库failover后也启动后就crash)都无法正常工作,请求给予技术支持。
数据库配置了物理dataguard,但是主库在进行了某些操作之后,主库直接crash.重启主库发现,数据库启动之后,稍后数据库继续CRASH
Wed Jul 08 16:32:22 2015 Thread 1 advanced to log sequence 401531 (LGWR switch) Current log# 6 seq# 401531 mem# 0: /opt/oracle/database/fast_recovery_area/xifenfei/onlinelog/o1_mf_6_b9p7qhrm_.log Archived Log entry 605194 added for thread 1 sequence 401530 ID 0xfbe74bc9 dest 1: ARC3: Standby redo logfile selected for thread 1 sequence 401530 for destination LOG_ARCHIVE_DEST_2 ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** Wed Jul 08 16:33:02 2015 Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p032_413611.trc (incident=201910): ORA-00600: internal error code, arguments: [17114], [0x7FB03BB6DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7FB03BB6DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/incident/incdir_201910/xifenfei_p032_413611_i201910.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p032_413611.trc (incident=201911): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x7FB03BB6DBE8], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17114], [0x7FB03BB6DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7FB03BB6DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/incident/incdir_201911/xifenfei_p032_413611_i201911.trc Wed Jul 08 16:34:23 2015 SMON: slave died unexpectedly, downgrading to serial recovery Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_413207.trc (incident=201547): ORA-00600: internal error code, arguments: [17182], [0x7F24D680D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/incident/incdir_201547/xifenfei_smon_413207_i201547.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 401530, block 119 to scn 73226510040 Recovery of Online Redo Log: Thread 1 Group 5 Seq 401530 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/xifenfei/onlinelog/o1_mf_5_b9p7qhl0_.log Recovery of Online Redo Log: Thread 1 Group 6 Seq 401531 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/xifenfei/onlinelog/o1_mf_6_b9p7qhrm_.log Block recovery completed at rba 401531.1882.16, scn 17.212066009 ORACLE Instance xifenfei (pid = 16) - Error 600 encountered while recovering transaction (13, 14) on object 135520. Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_413207.trc: ORA-00600: internal error code, arguments: [17182], [0x7F24D680D7A0], [], [], [], [], [], [], [], [], [], [] Wed Jul 08 16:34:24 2015 Dumping diagnostic data in directory=[cdmp_20150708163424], requested by (instance=1, osid=413207 (SMON)), summary=[incident=201547]. Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_413207.trc (incident=201548): ORA-00600: internal error code, arguments: [KSMFPG2], [0x7F24D680D000], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7F24D680D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/incident/incdir_201548/xifenfei_smon_413207_i201548.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Fatal internal error happened while SMON was doing active transaction recovery. Errors in file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_413207.trc: ORA-00600: internal error code, arguments: [KSMFPG2], [0x7F24D680D000], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7F24D680D7A0], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 413207): terminating the instance due to error 474 System state dump requested by (instance=1, osid=413207 (SMON)), summary=[abnormal instance termination]. System State dumped to trace file /opt/oracle/database/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_diag_413167_20150708163426.trc Dumping diagnostic data in directory=[cdmp_20150708163426], requested by (instance=1, osid=413207 (SMON)), summary=[abnormal instance termination]. Instance terminated by SMON, pid = 413207
由于主库不能正常open,备库直接failover方式激活
Wed Jul 08 17:56:41 2015 alter database recover managed standby database finish Terminal Recovery: request posted (xffdb) Wed Jul 08 17:56:45 2015 Begin: Standby Redo Logfile archival End: Standby Redo Logfile archival Terminal Recovery timestamp is '07/08/2015 17:56:45' Terminal Recovery: applying standby redo logs. Terminal Recovery: thread 1 seq# 401533 redo required Terminal Recovery: Recovery of Online Redo Log: Thread 1 Group 7 Seq 401533 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_7_bb7x31lk_.log Identified End-Of-Redo (failover) for thread 1 sequence 401533 at SCN 0xffff.ffffffff Incomplete Recovery applied until change 73226530800 time 07/08/2015 16:56:40 Terminal Recovery: successful completion Wed Jul 08 17:56:45 2015 ARCH: Archival stopped, error occurred. Will continue retrying Forcing ARSCN to IRSCN for TR 17:212086768 ORACLE Instance xffdb - Archival Error Attempt to set limbo arscn 17:212086768 irscn 17:212086768 ORA-16014: log 7 sequence# 401533 not archived, no available destinations ORA-00312: online log 7 thread 1: '/opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_7_bb7x31lk_.log' Resetting standby activation ID 4226239433 (0xfbe74bc9) Wed Jul 08 17:56:45 2015 MRP0: Media Recovery Complete (xffdb) MRP0: Background Media Recovery process shutdown (xffdb) Terminal Recovery: completion detected (xffdb) Completed: alter database recover managed standby database finish Wed Jul 08 17:56:58 2015 alter database commit to switchover to primary ALTER DATABASE SWITCHOVER TO PRIMARY (xffdb) Maximum wait for role transition is 15 minutes. Backup controlfile written to trace file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_ora_485094.trc Standby terminal recovery start SCN: 73226530482 RESETLOGS after incomplete recovery UNTIL CHANGE 73226530800 Online logfile pre-clearing operation disabled by switchover Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_1_bb7x30gw_.log: Thread 1 Group 1 was previously cleared Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_2_bb7x30js_.log: Thread 1 Group 2 was previously cleared Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_3_bb7x310q_.log: Thread 1 Group 3 was previously cleared Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_4_bb7x312r_.log: Thread 1 Group 4 was previously cleared Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_5_bb7x317f_.log: Thread 1 Group 5 was previously cleared Online log /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_6_bb7x31cz_.log: Thread 1 Group 6 was previously cleared Standby became primary SCN: 73226530481 Wed Jul 08 17:56:58 2015 Setting recovery target incarnation to 3 Switchover: Complete - Database mounted as primary Completed: alter database commit to switchover to primary
数据库激活成功后,重启激活之后数据库发现和主库出现类似情况
Wed Jul 08 17:57:25 2015 ALTER DATABASE MOUNT Successful mount of redo thread 1, with mount id 4243462021 Database mounted in Exclusive Mode Lost write protection disabled Completed: ALTER DATABASE MOUNT Wed Jul 08 17:57:30 2015 ALTER DATABASE OPEN Assigning activation ID 4243462021 (0xfcee1785) LGWR: STARTING ARCH PROCESSES Wed Jul 08 17:57:30 2015 ARC0 started with pid=23, OS id=485230 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thread 1 advanced to log sequence 2 (thread open) Wed Jul 08 17:57:31 2015 ARC1 started with pid=24, OS id=485236 Wed Jul 08 17:57:31 2015 ARC2 started with pid=25, OS id=485240 Wed Jul 08 17:57:31 2015 ARC3 started with pid=26, OS id=485244 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Thread 1 opened at log sequence 2 Current log# 2 seq# 2 mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_2_bb7x30js_.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery Wed Jul 08 17:57:31 2015 NSA2 started with pid=27, OS id=485248 [485226] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:3453451748 end:3453452018 diff:270 (2 seconds) Dictionary check beginning Dictionary check complete Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Wed Jul 08 17:57:32 2015 Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_p032_485380.trc (incident=64481): ORA-00600: internal error code, arguments: [17182], [0x7FE96B50DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64481/xffdb_p032_485380_i64481.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Starting background process QMNC Wed Jul 08 17:57:32 2015 QMNC started with pid=92, OS id=485512 LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Dumping diagnostic data in directory=[cdmp_20150708175733], requested by (instance=1, osid=485380 (P032)), summary=[incident=64481]. Thread 1 advanced to log sequence 3 (LGWR switch) Current log# 3 seq# 3 mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_3_bb7x310q_.log ARC3: STARTING ARCH PROCESSES Wed Jul 08 17:57:34 2015 ARC4 started with pid=93, OS id=485516 Wed Jul 08 17:57:35 2015 db_recovery_file_dest_size of 204800 MB is 0.41% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. ARC4: Archival started ARC3: STARTING ARCH PROCESSES COMPLETE krsk_srl_archive_int: Enabling archival of deferred physical standby SRLs Archived Log entry 273963 added for thread 1 sequence 2 ID 0xfcee1785 dest 1: Archived Log entry 273964 added for thread 1 sequence 401533 ID 0xfbe74bc9 dest 1: Completed: ALTER DATABASE OPEN Wed Jul 08 17:57:36 2015 Starting background process CJQ0 Wed Jul 08 17:57:36 2015 CJQ0 started with pid=95, OS id=485554 Shutting down archive processes ARCH shutting down ARC4: Archival stopped Wed Jul 08 17:57:41 2015 Thread 1 advanced to log sequence 4 (LGWR switch) Current log# 4 seq# 4 mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_4_bb7x312r_.log Wed Jul 08 17:57:41 2015 Archived Log entry 273965 added for thread 1 sequence 3 ID 0xfcee1785 dest 1: Wed Jul 08 17:58:30 2015 Sweep [inc][64481]: completed Sweep [inc2][64481]: completed Wed Jul 08 17:58:31 2015 Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_p032_485380.trc (incident=64482): ORA-00600: internal error code, arguments: [17114], [0x7FE96B50DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7FE96B50DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64482/xffdb_p032_485380_i64482.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Wed Jul 08 17:58:32 2015 Dumping diagnostic data in directory=[cdmp_20150708175832], requested by (instance=1, osid=485380 (P032)), summary=[incident=64482]. Wed Jul 08 17:58:36 2015 Thread 1 advanced to log sequence 5 (LGWR switch) Current log# 5 seq# 5 mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_5_bb7x317f_.log Wed Jul 08 17:59:02 2015 Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_p032_485380.trc (incident=64483): ORA-00600: internal error code, arguments: [17114], [0x7FE96B50DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17114], [0x7FE96B50DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7FE96B50DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64483/xffdb_p032_485380_i64483.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Wed Jul 08 17:59:04 2015 Dumping diagnostic data in directory=[cdmp_20150708175904], requested by (instance=1, osid=485380 (P032)), summary=[incident=64483]. Wed Jul 08 17:59:29 2015 Sweep [inc][64483]: completed Sweep [inc][64482]: completed Sweep [inc2][64483]: completed Sweep [inc2][64482]: completed Wed Jul 08 17:59:30 2015 Block recovery from logseq 2, block 104 to scn 73226531646 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_2_bb7x30js_.log Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_3_bb7x310q_.log Recovery of Online Redo Log: Thread 1 Group 4 Seq 4 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_4_bb7x312r_.log Recovery of Online Redo Log: Thread 1 Group 5 Seq 5 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_5_bb7x317f_.log Block recovery stopped at EOT rba 5.765.16 Block recovery completed at rba 5.765.16, scn 17.212087614 Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_p032_485380.trc (incident=64484): ORA-00600: internal error code, arguments: [KSMFPG2], [0x7FE96B50D000], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17114], [0x7FE96B50DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17114], [0x7FE96B50DBD0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7FE96B50DBF8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64484/xffdb_p032_485380_i64484.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Dumping diagnostic data in directory=[cdmp_20150708175934], requested by (instance=1, osid=485380 (P032)), summary=[incident=64487]. Wed Jul 08 17:59:36 2015 SMON: slave died unexpectedly, downgrading to serial recovery Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_smon_485190.trc (incident=64129): ORA-00600: internal error code, arguments: [17182], [0x7F5AED10D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64129/xffdb_smon_485190_i64129.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 2, block 104 to scn 73226531646 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_2_bb7x30js_.log Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_3_bb7x310q_.log Recovery of Online Redo Log: Thread 1 Group 4 Seq 4 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_4_bb7x312r_.log Recovery of Online Redo Log: Thread 1 Group 5 Seq 5 Reading mem 0 Mem# 0: /opt/oracle/database/fast_recovery_area/DBHLDB/onlinelog/o1_mf_5_bb7x317f_.log Block recovery completed at rba 5.765.16, scn 17.212087615 ORACLE Instance xffdb (pid = 16) - Error 600 encountered while recovering transaction (13, 14) on object 135520. Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_smon_485190.trc: ORA-00600: internal error code, arguments: [17182], [0x7F5AED10D7A0], [], [], [], [], [], [], [], [], [], [] Dumping diagnostic data in directory=[cdmp_20150708175937], requested by (instance=1, osid=485190 (SMON)), summary=[incident=64129]. Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_smon_485190.trc (incident=64130): ORA-00600: internal error code, arguments: [KSMFPG2], [0x7F5AED10D000], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7F5AED10D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/database/diag/rdbms/xffdb/xffdb/incident/incdir_64130/xffdb_smon_485190_i64130.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Fatal internal error happened while SMON was doing active transaction recovery. Errors in file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_smon_485190.trc: ORA-00600: internal error code, arguments: [KSMFPG2], [0x7F5AED10D000], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [17182], [0x7F5AED10D7A0], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 485190): terminating the instance due to error 474 System state dump requested by (instance=1, osid=485190 (SMON)), summary=[abnormal instance termination]. System State dumped to trace file /opt/oracle/database/diag/rdbms/xffdb/xffdb/trace/xffdb_diag_485150_20150708175939.trc Dumping diagnostic data in directory=[cdmp_20150708175939], requested by (instance=1, osid=485190 (SMON)), summary=[abnormal instance termination]. Instance terminated by SMON, pid = 485190
通过进一步分析确定是67号文件有异常,使用dbv检查该文件发现
DBVERIFY: Release 11.2.0.4.0 - Production on Thu Jul 9 11:08:27 2015 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /opt/oracle/database/oradata/xffdb/dk_adv_ts_main_25.dbf Block Checking: DBA = 283035270, Block Type = KTB-managed data block data header at 0x7f87813ea064 kdbchk: the amount of space used is not equal to block size used=7383 fsc=0 avsp=873 dtl=8088 Page 2016902 failed with check code 6110 Block Checking: DBA = 283035271, Block Type = KTB-managed data block data header at 0x7f87813ec064 kdbchk: the amount of space used is not equal to block size used=8676 fsc=0 avsp=832 dtl=8088 Page 2016903 failed with check code 6110 Block Checking: DBA = 283035272, Block Type = KTB-managed data block data header at 0x7f87813ee064 kdbchk: avsp(1306) > tosp(894) Page 2016904 failed with check code 6128 Block Checking: DBA = 283035273, Block Type = KTB-managed data block data header at 0x7f87813f0064 kdbchk: the amount of space used is not equal to block size used=7506 fsc=0 avsp=815 dtl=8088 Page 2016905 failed with check code 6110 Block Checking: DBA = 283035274, Block Type = KTB-managed data block data header at 0x7f87813f2064 kdbchk: the amount of space used is not equal to block size used=7892 fsc=0 avsp=884 dtl=8088 Page 2016906 failed with check code 6110 ………… Block Checking: DBA = 283035368, Block Type = KTB-managed data block data header at 0x7f87814ae064 kdbchk: the amount of space used is not equal to block size used=7934 fsc=0 avsp=837 dtl=8088 Page 2017000 failed with check code 6110 Block Checking: DBA = 283035369, Block Type = KTB-managed data block data header at 0x7f87814b0064 kdbchk: the amount of space used is not equal to block size used=7683 fsc=0 avsp=883 dtl=8088 Page 2017001 failed with check code 6110 Block Checking: DBA = 283035370, Block Type = KTB-managed data block data header at 0x7f87814b2064 kdbchk: the amount of space used is not equal to block size used=8556 fsc=0 avsp=841 dtl=8088 Page 2017002 failed with check code 6110 Block Checking: DBA = 283035371, Block Type = KTB-managed data block data header at 0x7f87814b4064 kdbchk: the amount of space used is not equal to block size used=7460 fsc=0 avsp=822 dtl=8088 Page 2017003 failed with check code 6110 DBVERIFY - Verification complete Total Pages Examined : 4063232 Total Pages Processed (Data) : 2724435 Total Pages Failing (Data) : 102 Total Pages Processed (Index): 1064839 Total Pages Failing (Index): 0 Total Pages Processed (Other): 273957 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 1 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 212065913 (17.212065913)
把datafile 67给offline之后,数据库open正常,也不再crash。这里比较明显,导致主库和被动都出现异常,主库直接crash,然后重启后,一会儿就crash;备库激活后,重启后一会儿也就crash;两个库现象相同。最后通过dbv定位到是由于某个文件出现大量逻辑坏块,导致数据库open之后,进行回滚之时crash.对于此类故障,可以通过屏蔽事务回滚,并且通过alert日志和trace文件定位到异常对象,可以删除异常对象可以暂时解决该问题。而导致数据库出现类似问题。
通过这个案例,可以的出来,由于oracle某种bug或者其他原因,导致block 逻辑损坏,而且这个损坏会传输到备库,导致备库也异常,oracle的备份,容灾不能全部依赖物理dataguard容灾。因此在条件允许情况下,建议增加物理备份和逻辑容灾(类似OGG)