ORA-00600[kcbshlc_1]导致数据库 down 案例

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600[kcbshlc_1]导致数据库 down 案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉

Sun Jul  8 17:20:10 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
PMON: terminating instance due to error 472

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.9-89.ELsmp
Version:	#1 SMP Mon Apr 20 10:33:05 EDT 2009
Machine:	x86_64
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 16412, image: oracle@localhost.localdomain (PMON)

*** 2012-07-08 03:00:11.351
*** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338
*** SESSION ID:(1105.1) 2012-07-08 03:00:11.338
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
freeing in-flux r/w latch for process state: 1fc165d248
... in-flux r/w latch  1fc1fcc9b0 Child cache buffers chains level=1 child#=4753 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 113266196
        state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     139 (2, 1341687611, 2)
     192 (2, 1341687611, 2)
     191 (2, 1341687611, 2)
     173 (2, 1341687611, 2)
     185 (2, 1341687611, 2)
     176 (2, 1341687611, 2)
     174 (2, 1341687611, 2)
     118 (2, 1341687611, 2)
     190 (2, 1341687611, 2)
     179 (2, 1341687611, 2)
     184 (1, 1341687611, 1)
     189 (1, 1341687611, 1)
     177 (1, 1341687611, 1)
     195 (1, 1341687611, 1)
     187 (1, 1341687611, 1)
     194 (1, 1341687611, 1)
     147 (1, 1341687611, 1)
     183 (1, 1341687611, 1)
     143 (1, 1341687611, 1)
     144 (1, 1341687611, 1)
     186 (1, 1341687611, 1)
     188 (1, 1341687611, 1)
     196 (1, 1341687611, 1)
     145 (1, 1341687611, 1)
     193 (1, 1341687611, 1)
     waiter count=25
*** 2012-07-08 03:50:06.228
 wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3
lcuridx 1, lasz 0x3c1ffc110
*** 2012-07-08 16:30:05.294
freeing in-flux r/w latch for process state: 20406507f0
... in-flux r/w latch  1f81265f28 Child cache buffers chains level=1 child#=14180 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 71341989
        state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     121 (2, 1341736205, 2)
     116 (2, 1341736205, 2)
     125 (2, 1341736205, 2)
     140 (2, 1341736205, 2)
     145 (2, 1341736205, 2)
     waiter count=5
freeing in-flux r/w latch for process state: 1fc165f9d0
... in-flux r/w latch  1f813aec18 Child cache buffers chains level=1 child#=20914 
        Location from where latch is held: kcbrls: kslbegin: 
        Context saved from call: 96505705
        state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123
*** 2012-07-08 17:20:10.876
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
*** 2012-07-08 17:20:10.876
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000000 ?
kcbshlc()+239        call     kgeasnmierr()        0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
kslilcr()+770        call     kcbshlc()            0066876E0 ? 1F801DDB28 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
ksl_cleanup()+1567   call     kslilcr()            7FBFFFCE50 ? 000000000 ?
                                                   7FBFFFDCE0 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxfl()+492         call     ksl_cleanup()        000000000 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxda()+55          call     ksuxfl()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksucln()+1390        call     ksuxda()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksbrdp()+794         call     ksucln()             060008100 ? 000000000 ?
                                                   FFFFFFFF9720ED9F ?
                                                   1F801DDB28 ? 0066876E0 ?
                                                   000000021 ?
opirip()+616         call     ksbrdp()             060008100 ? 000000000 ?
                                                   000000001 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opimai_real()+317    call     sou2o()              7FBFFFF670 ? 000000032 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF700 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
__libc_start_main()  call     main()               000000003 ? 7FBFFFF700 ?
+219                                               000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
_start()+42          call     __libc_start_main()  000713984 ? 000000001 ?
                                                   7FBFFFF848 ? 005288D00 ?
                                                   000000000 ? 000000003 ?

通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.

查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.

此条目发表在 ORA-xxxxx 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

ORA-00600[kcbshlc_1]导致数据库 down 案例》有 1 条评论

  1. 惜分飞 说:

    RAC Instance Generates ORA-600 [Kcbshlc_1] Errors and Pmon Crashes

    Applies to:
    Oracle Server - Enterprise Edition - Version: 10.2.0.1.0
    This problem can occur on any platform.
    
    Symptoms
    The 10.2.0.1 RAC database will encounter a hang and then crashes.  
    In the alert.log, we can see the following errors:
    
    ORA-00600: internal error code, arguments: [kcbshlc_1], [71], [], [], [], [], [], []
    PMON: terminating instance due to error 472
    
    Stack Trace is similar to the following:
    
    ksedst ksedmp ssexhd sigreturn kcbshlc kslilcr ksl_cleanup ksuxfl ksuxda ksucln ksbrdp opirip
    
    Cause
    This is unpublished Bug 4723109. 
    
    This is a known issue where the latches for clean up are not passed correctly and cause corruption
    in the process and a crash.  According to development, the fix for this bug is 
    scheduled to be included in the 10.2.0.5 patch set.
    
    Solution
    To implement the solution, please execute the following steps: 
    
    Apply Patch 4723109. 
    
    If you'd like to check for Patch Availability from Metalink 
    
    1) Click on Patches & Updates 
    2) Click on Simple Search 
    3) Enter patch number: 4723109 
    4) Select your O/S 
    5) Click Go.
    
    Note: Please review the Readme file for instructions on how to install the patchset.