月归档:七月 2015

ORA-600 kccpb_sanity_check_2故障恢复

今天是有人在淘宝旺旺上找我,需要oracle数据库恢复支持
wangwang


远程登录上去一看发现数据库mount的时候报ORA-600[kccpb_sanity_check_2]错误

C:\Documents and Settings\Administrator>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.3.0 - Production on Wed Jul 29 16:23:18 2015

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [14169],
[14160], [0x0], [], [], [], []

尝试重建控制文件

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup pfile='D:\database\m104\pfile\init.ora' nomount
ORACLE instance started.

Total System Global Area  444596224 bytes
Fixed Size                  1291072 bytes
Variable Size             155192512 bytes
Database Buffers          281018368 bytes
Redo Buffers                7094272 bytes

SQL> SHOW PARAMETER CONT;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      D:\DATABASE\M104\CTRL\CONTROL0
                                                 2.CTL
global_context_pool_size             string
SQL> ALTER DATABASE MOUNT;
ALTER DATABASE MOUNT
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [14169],
[14160], [0x0], [], [], [], []


SQL>
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" NORESETLOGS  FORCE LOGGING NOAR
CHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;
CREATE CONTROLFILE REUSE DATABASE "m104_db" NORESETLOGS  FORCE LOGGING NOARCHIVE
LOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [1], [8],
[], [], [], [], []

SQL>
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCH
IVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;
CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCHIVELO
G
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [1], [8],
[], [], [], [], []

无论是使用noresetlogs还是resetlogs,重建控制文件都报ORA-600[kccsga_update_ckpt_4]错误.比较奇怪,无解指定控制文件新名称重建试试看

修改控制文件路径

SQL> SHUTDOWN ABORT
ORACLE instance shut down.
SQL> startup pfile='D:\database\m104\pfile\init.ora' nomount
ORACLE instance started.

Total System Global Area  444596224 bytes
Fixed Size                  1291072 bytes
Variable Size             155192512 bytes
Database Buffers          281018368 bytes
Redo Buffers                7094272 bytes
SQL> SHOW PARAMETER CONT;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      D:\DATABASE\M104\CTRL\CONTROL0
                                                 4.CTL
global_context_pool_size             string
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCH
IVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;

Control file created.

使用新的控制文件位置,这次终于数据库重建控制文件成功
尝试指定redo进行恢复,数据库正常打开

SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
ORA-00279: change 3643108240801 generated at 07/26/2015 20:15:22 needed for
thread 1
ORA-00289: suggestion :
D:\ORACLE\PRODUCT\10.2.0\DB_1\RDBMS\ARC00567_0866390669.001
ORA-00280: change 3643108240801 for thread 1 is in sequence #567


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
D:\database\m104\log\redo01.log
ORA-00310: archived log contains sequence 566; sequence 567 required
ORA-00334: archived log: 'D:\DATABASE\M104\LOG\REDO01.LOG'


ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: 'D:\DATABASE\M104\DATA\SYSTEM01.DBF'


SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
ORA-00279: change 3643108240801 generated at 07/26/2015 20:15:22 needed for
thread 1
ORA-00289: suggestion :
D:\ORACLE\PRODUCT\10.2.0\DB_1\RDBMS\ARC00567_0866390669.001
ORA-00280: change 3643108240801 for thread 1 is in sequence #567


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
D:\database\m104\log\redo02.log
Log applied.
Media recovery complete.
SQL> ALTER DATABASE OPEN RESETLOGS;

Database altered.

数据库恢复完成。整个数据库恢复比较简单,但是注意这里的ORA-600[kccsga_update_ckpt_4]通过修改控制文件路径规避,具体原因待查。

知识点补充:ORA-600 [kccpb_sanity_check_2] [a] [b] {c}

VERSIONS:
  Versions 10.2 to 11.2

DESCRIPTION:

  This internal error is raised when the sequence number (seq#) of the
  current block of the controlfile is greater than the seq# in the controlfile header.

  The header value should always be equal to, or greater than the value
  held in the control file block(s).

  This extra check was introduced in Oracle 10gR2 to detect lost writes
  or stale reads to the header.

ARGUMENTS:
  Arg [a] seq# in control block header.
  Arg [b] seq# in the control file header.
  Arg {c} 
发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 , , , , | 评论关闭

多cpu环境中运行root.sh失败,asm报ORA-04031

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
root.sh-asm-fail


根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread... 
[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true 
[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on

[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on

oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel

[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete
Fri Jul 24 12:49:29 2015
LCK0 started with pid=22, OS id=46913 
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
LCK0 (ospid: 46913): terminating the instance due to error 4031
Fri Jul 24 12:49:35 2015
ORA-1092 : opitsk aborting process
Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
BUG-14292825


通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	RAC01
Release:	2.6.32-358.el6.x86_64
Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:	x86_64
Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
System parameters with non-default values:
  large_pool_size          = 16M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.10.10.31
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor	: 191
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
stepping	: 7
cpu MHz		: 1200.000
cache size	: 24576 KB
physical id	: 7
siblings	: 24
core id		: 13
cpu cores	: 12
apicid		: 251
initial apicid	: 251
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
1


每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
2

由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述

发表在 ORA-xxxxx, Oracle RAC | 标签为 , | 评论关闭

win平台报ORA-15055 ORA-21561错误处理—增加SharedSection值

有一套win环境的rac,不定期的异常,alert日志经常报ORA-15055 ORA-21561错误,经过分析是由于win默认的Desktop Heap Size值太小导致该问题,以此提醒各位win平台跑oracle 的朋友注意。
alert日志报错

  Current log# 6 seq# 1820 mem# 0: +DATA/rac/onlinelog/group_6.3110.876059421
Thread 2 advanced to log sequence 1821 (LGWR switch)
  Current log# 5 seq# 1821 mem# 0: +DATA/rac/onlinelog/group_5.3109.876059417
Mon Jul 06 14:11:34 2015
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
NOTE: Deferred communication with ASM instance
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_arc0_276.trc:
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败
NOTE: deferred map free for map id 28882
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_arc0_276.trc:
ORA-00313: 无法打开日志组 6 (用于线程 2) 的成员
ORA-00312: 联机日志 6 线程 2: '+DATA/rac/onlinelog/group_6.3110.876059421'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/rac/onlinelog/group_6.3110.876059421
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败
ARCH: Archival stopped, error occurred. Will continue retrying
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
NOTE: Deferred communication with ASM instance
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_arc0_276.trc:
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败
NOTE: deferred map free for map id 28883
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_arc0_276.trc:
ORA-00313: 无法打开日志组 6 (用于线程 2) 的成员
ORA-00312: 联机日志 6 线程 2: '+DATA/rac/onlinelog/group_6.3110.876059421'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/rac/onlinelog/group_6.3110.876059421
ORA-15055: 无法连接到 ASM 实例
ORA-21561: 生成 OID 失败

数据库版本

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

OS版本

SQL> host systeminfo
                                                                            
主机名:           RAC2
OS 名称:          Microsoft Windows Server 2008 R2 Enterprise 
OS 版本:          6.1.7601 Service Pack 1 Build 7601
OS 制造商:        Microsoft Corporation
OS 配置:          独立服务器
OS 构件类型:      Multiprocessor Free
注册的所有人:     Windows 用户

通过查询MOS,发现该问题主要是由于win的SharedSection设置不足导致,而此类问题可能还导致其他错误,如:TNS12531:TNS:cannot allocate memory。

建议:对于Microsoft Windows 平台数据库,数据库版本为Version 11.2.0.1 to 12.1.0.2的系统,根据实际情况适当增加SharedSection大小。

例如:修改\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems\: SharedSection=1024,20480,1024。
这里的第二个为交互式的Desktop Heap Size,第三个是非交互式的Desktop Heap Size,我们主要修改第三个值

发表在 Oracle | 标签为 , , , | 一条评论