标签归档:ORA-600 4193

Oracle 19c故障恢复

有客户找到我们,他们的oracle 19c数据库由于异常断电,导致启动异常,经过一系列恢复之后,依旧无法解决问题,请求我们给予支持.通过我们的Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check),获取数据库当前信息如下:
数据库版本为19C并且安装了19.5.0.0.191015 (30125133)补丁
20200310220453
20200310220748


数据库使用pdb
20200310220610

数据库启动成功后,一会就crash掉

2020-03-10T01:44:41.018032+08:00
Pluggable database RACBAK opened read write
2020-03-10T01:44:41.018996+08:00
Pluggable database RAC opened read write
2020-03-10T01:44:51.244050+08:00
Completed: ALTER PLUGGABLE DATABASE ALL OPEN
Starting background process CJQ0
Completed: ALTER DATABASE OPEN
2020-03-10T01:44:51.317085+08:00
CJQ0 started with pid=224, OS id=32581 
2020-03-10T01:44:56.067043+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j001_32588.trc  (incident=1095281) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095281/XFF_j001_32588_i1095281.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.073112+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.079228+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j002_32590.trc  (incident=1095289) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [2633], [2638], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095289/XFF_j002_32590_i1095289.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.085068+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.115765+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j004_32594.trc  (incident=1095305) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [63532], [63537], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095305/XFF_j004_32594_i1095305.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:46:48.202213+08:00
RAC(4):Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
RAC(4):  Mem# 0: /opt/oracle/oradata/XFF/redo02.log
RAC(4):Block recovery completed at rba 0.0.0, scn 0x0000000d3675e48e
RAC(4):DDE: Problem Key 'ORA 600 [4193]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2020-03-10T01:46:48.384040+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc:
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc  (incident=1093505) (PDBNAME=CDB$ROOT):
ORA-501 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1093505/XFF_clmn_31741_i1093505.trc
2020-03-10T01:46:49.264624+08:00
USER (ospid: 31741): terminating the instance due to ORA error 501
2020-03-10T01:46:49.280664+08:00
System state dump requested by (instance=1, osid=31741 (CLMN)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc
2020-03-10T01:46:53.156926+08:00
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157103+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc:
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157211+08:00
Dumping diagnostic data in directory=[cdmp_20200310014649], requested by (instance=1, osid=31741 (CLMN)), 
summary=[abnormal instance termination].

通过报错信息判断,数据库open之后(特别是pdb 4 open之后),开始报ORA-600 4193错误.然后由于CLMN进程异常,最后数据库crash.对于这类故障,因为使用的pdb,而且是由于pdb的undo异常导致数据库启动之后crash,可以通过对于pdb进行特殊处理,从而实现数据库启动之后不再crash.

发表在 ORACLE 19C, Oracle备份恢复 | 标签为 , , , | 评论关闭

ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.

ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add 
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect

Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK

Details: 
Undo corruption may be caused after a shrink and the same undo block may be used 
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
>create pfile from spfile;

1. Shutdown the instance

2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'

3. >startup restrict pfile=<initsid.ora>

4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.

If all offline then continue to the next step

5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;

6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;

7. >shutdown immediate;

8 >startup nomount;  --> Using your Original spfile

9 modify the spfile with the new undo tablespace name

  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;

10. >shutdown immediate;

11. >startup;  --> Using spfile
 


The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out 
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭

ORA-600 4193 错误说明和解决

ORA-600 4193 解释说明

ERROR:              

  Format: ORA-600 [4193] [a] [b]

VERSIONS:           
  versions 6.0 to 12.1

DESCRIPTION:        

  A mismatch has been detected between Redo records and Rollback (Undo) 
  records.

  We are validating the Undo block sequence number in the undo block against 
  the Redo block sequence number relating to the change being applied.

  This error is reported when this validation fails.

ARGUMENTS:
  Arg [a] Undo record seq number
  Arg [b] Redo record seq number

FUNCTIONALITY:
  KERNEL TRANSACTION UNDO





ORA-600 [4193] [a] [b] [ ] [ ]  [ ]        
Versions: 7.2.2  - 9.2.0                              Source: ktuc.c
===========================================================================
Meaning: seq# mismatch while adding an undo record to an undo block. This 
         is done by the application of redo. 
---------------------------------------------------------------------------
Argument Description:

    a. (ktubhseq): undo record seq# - this is the seq# of the block that 
                                      this undo record WILL BE APPLIED TO. 
                                      This is from the Undo Block. It is 
                                      NOT the seq# of the undo block itself.
                                      
    b. (ktudbseq): redo RECORD seq# - this is the seq# number in the block 
                                      that this redo WILL BE APPLIED TO. 
                                      This is from the Redo Record. 

---------------------------------------------------------------------------
Diagnosis:

    This error is raised in kturdb which handles the adding of undo records 
    by the application of redo. 
    
    When we try to apply redo to an undo block (forward changes are made by 
    the application of redo to a block) we check that the seq# in the undo 
    record matches the seq# in the redo record. These seq# should be the 
    same because when we apply a redo record we must apply it to the 
    correct version of the block. We can only apply a redo record to a 
    block that contains the same seq# as in the redo record. 

    If the seq# do not match then this error is raised. This implies some 
    kind of block corruption in either the redo or the undo block. 

7.3.x - 8.1.7.x
ASSERT2(ubh->ktubhseq == db->ktudbseq, OERI(4193), KSESVSGN,
            ubh->ktubhseq, db->ktudbseq);
9.2.x
ksesic2(OERI(4193), ksenrg(ubh->ktubhseq), ksenrg(db->ktudbseq));

struct ktubh
{
  kxid  ktubhxid;      /* txid of tx currently using or last used this block */
  ub2   ktubhseq;                              /* undo block sequence number */
  ub1   ktubhcnt;    /* high water mark record index, number of undo entries */
  ub1   ktubhirb;  /* rollback record index, rec index to start the rollback */
  ub1   ktubhicl;  /* collecting record index, rec index to start retrieving col info */
  ub1   ktubhflg;                                                 /* dummy */
  ub2   ktubhidx[1];     /* byte offset of record in block, grows at runtime */
};

struct ktudb   Kernel Transaction Undo Data operation Block (redo)
{
  ub2    ktudbsiz;                                          /* size of entry */
  ub2    ktudbspc;                 /* verification: space left in undo block */
  ub2    ktudbflg;            /* flag to indicate the kind of redo operation */
  kxid   ktudbxid;                                          /* current tx id */
  ub2    ktudbseq;                                  /* block sequence number */
  ub1    ktudbrec;                       /* new record index for this change */
};

ORA 600 4193 处理方法同How to resolve ORA-600 [4194] errors

发表在 Oracle备份恢复 | 标签为 | 评论关闭