标签归档:kjhn_post_ha_alert0-862

ORA-00600[kjhn_post_ha_alert0-862]原因分析

数据库版本和平台信息
数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上

ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 2 - type 586, 1 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M
Instance name: orcl

数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误

Wed Jun 03 21:50:40 2015
Restarting dead background process MMON
MMON started with pid=11, OS id=3804
Wed Jun 03 21:50:43 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

Wed Jun 03 21:50:49 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

Wed Jun 03 21:55:44 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

Wed Jun 03 21:55:49 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

Wed Jun 03 22:00:40 2015
Thread 1 advanced to log sequence 476
  Current log# 1 seq# 476 mem# 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
Wed Jun 03 22:00:44 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

查询对应trace文件发现

ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], [] 
Current SQL statement for this session: 
BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;

人工执行该过程

SQL> var success varchar2
SQL> begin
  2  :success := sys.dbms_ha_alerts_prvt.check_ha_resources;
  3  end;
  4  /
 
PL/SQL procedure successfully completed.
 
SQL> print success
 
SUCCESS                                                                       
  
--------------------------------                                              
  
N      

通过查询相关资料得到如下说明

@ This check is triggered with FAN enabled at this instance and it seems to be 
@ associated with a startup action. From the procedure itself which is called 
@ this is a run-once MMON (startup) action which supports instance down 
@ notification reliability. It does the folowing a) registers the current 
@ instance incarnation in recent_resource_incarnations$ if it's not already 
@ there b) deletes recent_resource_incarnations$ records that don't apply to 
@ this database. They may, e.g., have been copied from seed db or from a former 
@ DataGuard primary c) scans recent_resource_incarnations$ for instance 
@ incarnations that are no longer alive, and submits instance down alerts for 
@ them . If all is good then return 'Y' else 'N' (or error) if there is a 
@ failure. That failure is to get back to MMON, so that it may retry this 
@ action later. In the local instance I get a 'Y' but in the customer's system 
@ it fails with a 'N' which seems related to the ORA-600 assert.

@ This function is kjhn_post_ha_alert0() which is internal and does the real work of 
@ posting HA alerts. It is used by both kjhn_post_ha_alert and 
@ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of 
@ kjhn_post_ha_alert,other than the fact that it uses individual parameters 
@ rather than the more easily extensible structure. Also the parameters passed 
@ to it are the instance_name and the host_name which is the kernelized 
@ implementation for posting HA alerts. Without actually having the arguments 
@ the guess is that either the host_name or the instance_name raised in the 
@ assert is null which triggered it.

mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.

处理方法
This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
The bug is fixed in 11.1.0.6. A workaround is available for the problem.
该bug在11.1.0.6中得以修复

To implement the workaround, please execute the following steps as the SYS user: 

1. Collect the following information and spool it to a file for your records. 

a. output of select * from v$instance 
b. show parameter instance_name 
c. set pages 1000 
d. select * from recent_resource_incarnations$ 

2. Create a backup table of recent_resource_incarnations$. 

SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$; 


3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running.
    Do not issue this statement if a shutdown is pending. 

SQL> truncate table recent_resource_incarnations$; 


4. Perform a clean shutdown, followed by a startup.

具体参考:
ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS

发表在 ORA-xxxxx | 标签为 , | 一条评论