月归档:八月 2016

MON_MODS$表ORA-600 13013报错处理

有朋友反馈数据库启动运行一点时间之后,然后就自动crash,让我们帮忙找原因,通过分析是由于smon进程触发ORA-600 13013导致数据库异常
alert日志报错信息

Thu Aug  4 18:39:44 2016
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=33, OS id=22935
Thu Aug  4 18:39:44 2016
Completed: ALTER DATABASE OPEN
Thu Aug  4 18:39:44 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 18:48:41 2016
Thread 1 advanced to log sequence 86746
  Current log# 3 seq# 86746 mem# 0: /opt/ora10/oradata/ora10g/redo03.log
Thu Aug  4 18:58:13 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:58:56 2016
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Thu Aug  4 18:59:06 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:59:08 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_pmon_22413.trc:
ORA-00474: SMON process terminated with error
Thu Aug  4 18:59:08 2016
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 22413

通过trace文件大概可以发现是由于ORA-600 13013错误导致数据库crash,而且这里有类似”SMON was doing flushing of monitored table stats”错误提示,根据经验,很可能是smon把表的dml操作收集信息相关.

ORA-600 [13013] 含义

ORA-600 [13013] [a] [b] {c} [d] [e] [f] 


This format relates to Oracle Server 8.0.3 to 10.1 

Arg [a] Passcount 
Arg [b] Data Object number 
Arg {c} Tablespace Relative DBA of block containing the row to be updated 
Arg [d] Row Slot number 
Arg [e] Relative DBA of block being updated (should be same as 1) 
Arg [f] Code 

根据这个错误信息,以及How to resolve ORA-00600 [13013], [5001] [ID 816784.1]中的描述

ORA-600 13013 对应对象

SQL> select object_name from dba_objects where object_id=482

OBJECT_NAME
--------------------------------------------------------------------------------
MON_MODS$

该对象正是和监控dml变化相关的表,smon会对其进行相关操作,以前写过一篇:MON_MODS$和MON_MODS_ALL$统计DML操作次数的文章
对于MON_MODS$表ORA-600 13013处理

SQL> analyze table mon_mods$ validate structure cascade;

analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

 
SQL> select index_name from dba_indexes where table_name='MON_MODS$';

INDEX_NAME
------------------------------
I_MON_MODS$_OBJ

SQL> ALTER INDEX I_MON_MODS$_OBJ REBUILD;

Index altered.

SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

SQL> CREATE TABLE MON_MODS_BAK AS SELECT * FROM MON_MODS$;

Table created.

SQL> SELECT COUNT(*) FROM MON_MODS$;

  COUNT(*)
----------
      1247

SQL> C/MON_MODS$/MON_MODS_BAK;
  1* SELECT COUNT(*) FROM MON_MODS_BAK
SQL> /

  COUNT(*)
----------
      1247

SQL> TRUNCATE TABLE MON_MODS$;

Table truncated.

SQL> INSERT INTO MON_MODS$ SELECT * fROM MON_MODS_BAK;

1247 rows created.

SQL> COMMIT;

Commit complete.

SQL>  analyze table mon_mods$ validate structure cascade;

Table analyzed.

自此关于MON_MODS$表相关的ORA-600 13013异常处理完全,当然也可以通过重建I_MON_MODS$_OBJ索引来解决,但是不能通过rebuild index解决.数据库也就不会因此而crash了.

发表在 Oracle备份恢复 | 评论关闭

ORA-600 4042 故障恢复

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check) 检查结果
wrong_scn
通过上图可以知道file 2未能正常恢复(需要看日志分析原因),file 3以前就被offline,需要历史归档(非归档状态,所以这个先放着,后续再处理)

分析file 2 不成功原因

Wed Aug  3 15:21:11 2016
ALTER DATABASE RECOVER  datafile 2  
Wed Aug  3 15:21:11 2016
Media Recovery Start
 parallel recovery started with 2 processes
Wed Aug  3 15:21:11 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Wed Aug  3 15:21:11 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:13 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:18 2016
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

通过日志可以知道由于ORA-600 3020导致file 2不能正常的恢复.
处理file 2

SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.
Thu Aug  4 01:58:35 2016
ALTER DATABASE RECOVER  datafile 2 allow 1 corruption  
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Thu Aug  4 01:58:35 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:58:35 2016
Media Recovery Complete (oracle)
Completed: ALTER DATABASE RECOVER  datafile 2 allow 1 corruption  

尝试open数据库

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
Thu Aug  4 01:59:20 2016
alter database open 
Thu Aug  4 01:59:21 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Thu Aug  4 01:59:21 2016
Started redo scan
Thu Aug  4 01:59:21 2016
Completed redo scan
 1619 redo blocks read, 0 data blocks need recovery
Thu Aug  4 01:59:21 2016
Started redo application at
 Thread 1: logseq 1916, block 12724
Thu Aug  4 01:59:21 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:59:21 2016
Completed redo application
Thu Aug  4 01:59:21 2016
Completed crash recovery at
 Thread 1: logseq 1916, block 14343, scn 3303614971196
 0 data blocks read, 0 data blocks written, 1619 redo blocks read
Thu Aug  4 01:59:21 2016
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=5542
Thu Aug  4 01:59:21 2016
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=19, OS id=5544
Thu Aug  4 01:59:21 2016
Thread 1 advanced to log sequence 1917
Thread 1 opened at log sequence 1917
  Current log# 2 seq# 1917 mem# 0: /home/oracle/orabase/oradata/ORACLE/redo02.log
Successful open of redo thread 1
Thu Aug  4 01:59:21 2016
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug  4 01:59:21 2016
ARC1: STARTING ARCH PROCESSES
Thu Aug  4 01:59:21 2016
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Aug  4 01:59:21 2016
SMON: enabling cache recovery
Thu Aug  4 01:59:21 2016
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=20, OS id=5546
Thu Aug  4 01:59:21 2016
db_recovery_file_dest_size of 2048 MB is 1.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 01:59:22 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 5505
ORA-1092 signalled during: alter database open ...

由于ORA-600 4042错误导致数据库无法正常open.
分析ORA-600 4042

PARSING IN CURSOR #4 len=142 dep=1 uid=0 oct=3 lid=0 tim=1435788503594313 hv=361892850 ad='a7ab2db8'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #4:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594311
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aae75802218  bln=22  avl=02  flg=05
  value=3
EXEC #4:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594393
FETCH #4:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,tim=1435788503594412
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=8 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=3 us)'
WAIT #1: nam='db file sequential read' ela= 10 file#=2 block#=41 blocks=1 obj#=-1 tim=1435788503594468
Dump of buffer cache at level 4 for tsn=1, rdba=8388649
BH (0x95ff3c58) file#: 2 rdba: 0x00800029 (2/41) class: 21 ba: 0x95ef0000
  set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [a8b77880,a8b77880] lru: [95ff3dd0,a8e70338]
  ckptq: [NULL] fileq: [NULL] objq: [a43da110,a43da110]
  use: [a8e6e658,a8e6e658] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800029 (2/41)
  scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
  frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
Dump of memory from 0x0000000095EF0000 to 0x0000000095EF2000
095EF0000 0000A200 00800029 00000000 01010000  [....)...........]
095EF0010 00000000 00000000 00000000 00000000  [................]
        Repeat 509 times
095EF1FF0 00000000 00000000 00000000 00000001  [................]
Dump of memory from 0x0000000095EF0014 to 0x0000000095EF1FFC
095EF0010          00000000 00000000 00000000      [............]
095EF0020 00000000 00000000 00000000 00000000  [................]

这里可以发现,file 2 block 41的type为unknown,注意观察ORA-600 3020的错误,我们发现当时报的坏块也正好是该block.基本上可以确定由于前面的allow 1 corruption操作导致了后面的ORA-600 4042的错误.官方关于ORA-600[4042]解释
ORA-600-4042


通过修改undo$中的回滚段状态(参考:bbed修改undo$(回滚段)状态)
正常open数据库,修改file 3的scn并online数据文件

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1224736768 bytes
Fixed Size                  2020384 bytes
Variable Size             318770144 bytes
Database Buffers          889192448 bytes
Redo Buffers               14753792 bytes
Database mounted.
SQL>   SELECT thread#,              
  2           a.sequence#,
  3           a.group#,
  4           TO_CHAR (first_change#, '9999999999999999') "SCN",
  5           a.status,
  6           MEMBER
  7      FROM v$log a, v$logfile b
  8     WHERE a.group# = B.GROUP#
  9  ORDER BY a.sequence# DESC;

   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1919          1     3303615011212
CURRENT
/home/oracle/orabase/oradata/ORACLE/redo01.log

         1       1918          3     3303614991206
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo03.log

   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------

         1       1917          2     3303614971197
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo02.log


SQL> recover database using backup controlfile;
ORA-00279: change 3303615011452 generated at 08/04/2016 02:06:52 needed for
thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORACLE/archivelog/2016_08_04/o1_mf_1_1919_%u
_.arc
ORA-00280: change 3303615011452 for thread 1 is in sequence #1919


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/orabase/oradata/ORACLE/redo01.log
Log applied.
Media recovery complete.
SQL> alter database datafile 3 online;

Database altered.

SQL> alter database open resetlogs;

Database altered.

SQL> 

至此该数据库基本上恢复完成,强烈建议使用逻辑方式导出导入重建库.

发表在 Oracle备份恢复 | 标签为 , | 评论关闭

ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.

ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add 
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect

Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK

Details: 
Undo corruption may be caused after a shrink and the same undo block may be used 
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
>create pfile from spfile;

1. Shutdown the instance

2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'

3. >startup restrict pfile=<initsid.ora>

4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.

If all offline then continue to the next step

5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;

6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;

7. >shutdown immediate;

8 >startup nomount;  --> Using your Original spfile

9 modify the spfile with the new undo tablespace name

  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;

10. >shutdown immediate;

11. >startup;  --> Using spfile
 


The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out 
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭