记录一次ORA-00600[kdsgrp1]分析

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次ORA-00600[kdsgrp1]分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

找出报错对象

--方法1
*** SESSION ID:(795.16405) 2012-04-05 09:36:11.958
            row 080095ee.26 continuation at
            file# 32 block# 38382 slot 39 not found
**************************************************
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 39 ..... nrows: 19
**************************************************

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 32
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 32
Enter value for block_id: 38382
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 38382 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
AHV8
TBL_IVR_LOG
TABLE PARTITION    CSS_PARTITION                  IVR_LOG_2012_MONTH04

--方法2
*** 2012-04-05 09:36:11.965
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO TBL_CONTACT_INFO_FAILED_TMP 
select * from TBL_IVR_LOG

SO: 70000017f954f50, type: 4, owner: 70000017f65a840, flag: INIT/-/-/0x00
(session) sid: 795 trans: 70000017464a1e8, creator: 70000017f65a840, flag: (40110041) USR/- BSY/-/-/-/-/-
              DID: 0002-0067-000305BD, short-term DID: 0002-0067-000305BE
              txn branch: 0
              oct: 2, prv: 0, sql: 70000015180ee98, psql: 700000180d67550, user: 49/AHV8
service name: SYS$USERS
O/S info: user: oracle10, term: UNKNOWN, ospid: 12976218, machine: zwq_kfdb2
              program: oracle@zwq_kfdb2 (J002)
last wait for 'db file sequential read' blocking sess=0x0 seq=226 wait_time=17071 seconds since wait started=1
                file#=20, block#=95ee, blocks=1

--方法3
Block header dump:  0x080095ee
 Object id on Block? Y
 seg/obj: 0x11eeb  csc: 0x6f2.848e814  itc: 2  flg: E  typ: 1 - DATA
     brn: 1  bdba: 0x7c09c89 ver: 0x01 opc: 0
     inc: 0  exflg: 0


SQL> select to_number('11eeb','xxxxxxxx') from dual;

TO_NUMBER('11EEB','XXXXXXXX')
-----------------------------
                        73451

SQL> select owner,object_name,subobject_name,object_type from dba_objects where data_object_id='73451';

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
SUBOBJECT_NAME                 OBJECT_TYPE
------------------------------ -------------------
AHV8
TBL_IVR_LOG
IVR_LOG_2012_MONTH04           TABLE PARTITION

验证是否真的坏块

SQL> select name from v$datafile where file#=32;

NAME
------------------------------------------------------
/dev/rdb1_data27

[zwq_kfdb2:/home/oraeye]dbv file='/dev/rdb1_data27' blocksize=8192

DBVERIFY: Release 10.2.0.4.0 - Production on Fri Apr 13 15:33:10 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/rdb1_data27
  

DBVERIFY - Verification complete

Total Pages Examined         : 1048448
Total Pages Processed (Data) : 947357
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 4756
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 96335
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 297329920 (1778.297329920)

SQL> select count(*) from AHV8.TBL_IVR_LOG partition(IVR_LOG_2012_MONTH04);

  COUNT(*)
----------
   8798030

总结:很明显这次出现这个问题,因为内存中出现坏块导致,经过一段时间buffer cache中的坏块内容已经被老化,所以现在不能重现(甚至不用做任何操作)。如果内存中出现了坏块,而且还没有被老化掉,可以刷新data buffer;如果是数据块出现坏块,根据实际情况决定处理

此条目发表在 ORA-xxxxx 分类目录,贴了 标签。将固定链接加入收藏夹。

记录一次ORA-00600[kdsgrp1]分析》有 1 条评论

  1. 惜分飞 说:

    Causes and Solutions for ora-600 [kdsgrp1] [ID 1332252.1]

    Applies to:
    Oracle Server - Enterprise Edition - Version: 10.2.0.4 to 11.2.0.2 - Release: 10.2 to 11.2
    Information in this document applies to any platform.
    
    Purpose
    This document discusses the ora-600 [kdsgrp1] error, its possible causes and the work around solutions that can be tried. 
    
    Last Review Date
    June 22, 2011
    Instructions for the Reader
    A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
    Troubleshooting Details
    The ora-600 [kdsgrp1] error is thrown when a fetch operation fails to find the expected row. The error is hit in memory and so may be a memory only error or an error that results from corruption on disk.
    This error may indicate (but is not restricted to) any of the following conditions:
    •	Lost writes
    •	Parallel DML issues
    •	Index corruption
    •	Data block corruption
    •	Consistent read [CR] issues 
    •	Buffer cache corruption  
    A full list of known issues is given in ORA-600 [kdsgrp1] Note 285586.1 Each bug has a short description that indicates the circumstances where it is hit. The bug list can be shortened by selecting your database release to show only those issues that may affect you.
    This issue may be intermittent or it may persist until the underlying disk level corruption is fixed. Intermittent issues are likely to be memory based (however intermittent access to the corruption can be confused with intermittent memory issues).
    Common Work Around Solutions
    If the issue is in memory only we can try to immediately resolve the issue by flushing the buffer cache but remember to consider the performance impact on production systems:
    alter system flush buffer_cache;
    If we have an intermittent consistent read issue we can try disabling rowCR which is an optimization to reduce consistent-read rollbacks during queries by setting _row_cr=FALSE in the initialization files. However, this could lead to performance degradation of queries. Please check the ratio of the two statistics "RowCR hits"/"RowCR attempts" to determine whether the workaround is to be used.
    If this is a result of index corruption then we can drop and rebuild the index. Note that this will require a maintenance window on production systems.
    Root Cause Determination
    Now lets look at how we discover the root cause of the problem: the first step in finding the root cause of this issue is to inspect the generated trace file. The ora-600 will generate both a trace file in the trace directory and an incident file under the incident id within the incident directory. 
    The top part of the trace file tells us the SQL that was being run when the error was hit:
    ----- Current SQL Statement for this session (sql_id=9mamr7xn4wg7x) ----- 
    
    This immediately shows us the data objects that were accessed. Searching the trace file for the text string 'Plan Table' will locate the SQL execution plan that is dumped within this trace file. For a persistent issue this allows us to determine which indexes have been accessed and so identify indexes that should be validated to check for block corruption:
    SQL> analyze index scott.pk_dept validate structure online; 
    
    Index analyzed. 
    
    An other approach we can take is to use the file and block information contained in the trace file. At the top of the trace file we will find information on the block where the corruption was found: 
    *** SESSION ID:(3202.5644) 2011-03-19 04:12:16.910 
    row 07c7c8c7.a continuation at 
    file# 31 block# 510151 slot 11 not found 
    
    This information can be used to identify the object details in dba_extents: 
    Select owner, segment_name, segment_type, partition_name,tablespace_name 
    From dba_extents 
    Where relative_fno = <file id> 
    And <block#> between block_id and (block_id+blocks-1); 
    
    We can then validate this object, for example a table and all it's indexes: 
    Analyze table scott.dept validate structure cascade online; 
    Remember that we may be dealing with a permanent corruption that is not located in the object blocks themselves. Examples of this include:
    •	Dictionary corruption issue from transportable tablespace operations: check dba_tablespaces to see if the tablespace has been plugged in.
    •	Lost writes in ASM diskgroup mirrors - most likely to be seen when there is heavy IO and disk resync activity. To check this run dbms_diskgroup.checkfile to detect mirror discrepancies 
    If analyze reports no corruption then check if there are any chained rows on the table. If these exist then we may have an undetected corruption and the issue should reproduce whenever the SQL is run. Exporting the table will also detect this issue.
    If analyze and exporting the table (in the presence of chained rows) both report no errors then this should be considered a consistent read issue.
    Once you understand the nature of the problem you can review the list of known bugs and determine which one matches your condition. If you cannot determine which issue is affecting you then open a service request with Oracle Support and upload the RDBMS and ASM (if applicable)instance alert logs for all nodes, any trace and incident files generated and a full description of the nature of the problem.