标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,666)
- DB2 (22)
- MySQL (73)
- Oracle (1,528)
- Data Guard (51)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (159)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (559)
- Oracle安装升级 (90)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (77)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- 清空redo,导致ORA-27048: skgfifi: file header information is invalid
- A_H_README_TO_RECOVER勒索恢复
- 通过alert日志分析客户自行对一个数据库恢复的来龙去脉和点评
- ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
- ORA-01092 ORA-00604 ORA-01558故障处理
- ORA-65088: database open should be retried
- Oracle 19c异常恢复—ORA-01209/ORA-65088
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
- 如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)
- 如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)
- ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt] 相关bug
- ORA-600 krhpfh_03-1210故障处理
标签归档:ORA-00240
HP Itaniums上一次ORA-240异常处理
一、问题由来
今天有网友和我说,他的数据库始终只能mount,不能打开到open状态,具体状态为:
SQL> alter database open; alter database open * ERROR at line 1: ORA-01154: database busy. Open, close, mount, and dismount not allowed now
启动数据库一直停留在Database mounted.
最后报ORA-03113: end-of-file on communication channel
二、数据库环境
OS version:HP-UX Itanium version 11.31 Oracle version:Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
三、日志文件错误
查看alert日志
Fri Nov 11 09:31:01 2011 Completed: ALTER DATABASE MOUNT Fri Nov 11 09:31:08 2011 ALTER DATABASE OPEN Fri Nov 11 09:40:32 2011 Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_dbw0_3122.trc: ORA-00240: control file enqueue held for more than 120 seconds Fri Nov 11 09:47:18 2011 Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_dbw0_3122.trc: Fri Nov 11 10:02:53 2011 Errors in file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc: ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 3122' Fri Nov 11 10:02:55 2011 System State dumped to trace file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc Killing enqueue blocker (pid=3122) on resource CF-00000000-00000000 by killing session 54.1 Fri Nov 11 10:07:57 2011 Errors in file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc: ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 3122' Fri Nov 11 10:07:59 2011 System State dumped to trace file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc Killing enqueue blocker (pid=3122) on resource CF-00000000-00000000 by terminating the process USER: terminating instance due to error 2103 Instance terminated by USER, pid = 3555
四、分析ORA-00240和ORA-00494
[oracle@node1 ~]$ oerr ora 240 00240, 00000, "control file enqueue held for more than %s seconds" // *Cause: The current process did not release the control file enqueue within // the maximum allowed time. // *Action: Reissue any commands that failed and contact Oracle Support // Services with the error information. [oracle@node1 ~]$ oerr ora 494 00494, 00000, "enqueue%s held for too long%s by '%s'" // *Cause: The specified process did not release the enqueue within // the maximum allowed time. // *Action: Reissue any commands that failed and contact Oracle Support // Services with the incident information. ORA-00240: control file enqueue held for more than 120 seconds ORA-00494: enqueue [CF] held for too long (more than 900 seconds)
dbw0进程持有control file enqueue超过了默认的900s,导致系统kill掉持有这个enqueue的会话,然后数据库自动终止。现在问题定位在control file enqueue的持有时间为什么会超时(ORA-00240)上面
五、深入理解ORA-00240
ORA-00240: control file enqueue held for more than 120 seconds The Developers confirmed that this is just a warning to let the DBA know that a CF enqueue is being held for more than 120 seconds. This is not an error, and error will occur if a CF enqueue is held for more than 900 seconds (15 minutes) and this is not the case here. The message occurs when there are many datafiles in the database. DBWriter (dbw0) is taking too much time to release the CF enqueue due to having to open these datafiles.
从mos的这几句中,可以分析出来,是因为dbw0在open数据文件的过程中占用了太多时间的CF enqueue。这说明和存储或者asm有关系
六、继续分析日志
分析日志,找出出现ORA-00240和ORA-00494之前出现过什么错误
Stopping Job queue slave processes, flags = 7 Thu Nov 10 17:52:26 2011 Job queue slave processes stopped Thu Nov 10 17:54:23 2011 Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_asmb_26101.trc: ORA-00600: internal error code, arguments: [kffmAllocate_1], [8], [2], [], [], [], [], [] Thu Nov 10 17:54:24 2011 Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_asmb_26101.trc: ORA-00600: internal error code, arguments: [kffmAllocate_1], [8], [2], [], [], [], [], [] Thu Nov 10 17:54:24 2011 ASMB: terminating instance due to error 486 Termination issued to instance processes. Waiting for the processes to exit Instance terminated by ASMB, pid = 26101
通过这个,发现和asm有关系,更加确定了自己的猜测
七、分析asm日志
Fri Nov 11 09:44:42 2011 Errors in file /opt/oracle/oracle1/admin/+ASM/bdump/+asm_gmon_24008.trc: ORA-27091: unable to queue I/O ORA-27072: File I/O error HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: 4 Additional information: 2044 Additional information: -1
+asm_gmon_24008.trc中内容也就报这些错误,不再贴出
八、mos中查找ORA-27091和ORA-27072
Bug 8236874 – HP-Itanium: Intermittent ORA-27091, ORA-27072, Additional information: 4 on HPUX中有类此描述
ORA-01115: IO error reading block from file fff (block # bbb) ORA-01110: data file 5: '/dev/xxx' ORA-27091: unable to queue I/O ORA-27072: File I/O error Additional information: 4 Additional information: xxx Additional information: yyy
问题很类此,进一步确信是asm的问题
八、在查询mos中ORA-00240和asm相关主题
发现HP Itanium – ORA-240 or process on ASM & Database hang [ID 1105825.1],阅读完毕,马上查去阅读gminer_ora_3555.trc文件
果然发现和mos中一致的描述
*** SESSION ID:(40.1) 2011-11-11 09:36:59.860 Waited for detached process: DBW0 for 300 seconds: *** 2011-11-11 09:36:59.860 Dumping diagnostic information for DBW0: OS pid = 3122 loadavg : 0.00 0.01 0.01 Swapinfo : Avail = 124485.08Mb Used = 62062.20Mb Swap free = 62422.88Mb Kernel rsvd = 4492.72Mb Free Mem = 60374.91Mb F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME COMD 1401 S oracle 3122 1 0 154 20 e00000090de8b980 55372 e0000009203b1340 09:19:12 ? 0:06 ora_dbw0_gminer Attaching to program: /opt/oracle/oracle1/bin/oracle, process 3122 warning: The shared libraries were not privately mapped; setting a breakpoint in a shared library will not work until you rerun the program; stepping over longjmp calls will not work as expected. Please set the kernel variable "shlib_debug_enable" to 1 to enable the shared library debugging warning: Load module /opt/oracle/oracle1/lib/libskgxp10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libhasgen10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libocr10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libocrb10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libocrutl10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libdbcfg10.so has been stripped. Debugging information is not available. warning: Load module /opt/oracle/oracle1/lib/libnnz10.so has been stripped. Debugging information is not available. 0xc00000000043e3f0:0 in pw_wait+0x30 () from /usr/lib/hpux64/libc.so.1 (gdb) (gdb) #0 0xc00000000043e3f0:0 in pw_wait+0x30 () from /usr/lib/hpux64/libc.so.1 #1 0x40000000094e5ee0:0 in pw_wait () at sskgp.c:2245 #2 0x4000000002cb36c0:0 in sskgpwwait () at sskgp.c:926 #3 0x4000000002f16440:0 in skgpwwait () at skgp.c:2137 #4 0x4000000002e28770:0 in ksliwat () at ksl.c:7428 #5 0x400000000402a7f0:0 in kslwaitns_timed () at ksl.c:7652 #6 0x4000000002d3de40:0 in kskthbwt () at ksk.c:2263 #7 0x4000000002d3c940:0 in kslwait () at ksl.c:7616 #8 0x40000000041780e0:0 in ksvsubmit () at ksv.c:1975 #9 0x4000000004d5b710:0 in kfncSlaveSubmit () at kfnc.c:2913 #10 0x4000000004d63120:0 in kfncFileIdentify () at kfnc.c:1638 #11 0x4000000004cfb890:0 in kfioIdentify () at kfio.c:1431 #12 0x4000000004137c20:0 in ksfd_osmopn () at ksfd.c:16583 #13 0x40000000044c9850:0 in $cold_ksfdopn+0xc0 () at kfio.c:1654 #14 0x4000000002630aa0:0 in kcfbid () at kcf.c:13447 #15 0x40000000025b72f0:0 in kcfida () at kcf.c:13599 #16 0x4000000002d3bc80:0 in ksbabs () at ksb.c:1390 #17 0x4000000003aed5f0:0 in ksbrdp () at ksb.c:2971 #18 0x4000000002807f50:0 in opirip () at opirip.c:283 #19 0x400000000200f680:0 in $cold_opidrv+0x580 () #20 0x400000000277edf0:0 in sou2o () at sou2o.c:86 #21 0x4000000001fff020:0 in $cold_opimai_real+0x280 () #22 0x400000000269c5e0:0 in main () at opimai.c:173 (gdb) Detaching from program: /opt/oracle/oracle1/bin/oracle, process 3122
九、HP官网描述
Issue Any questions on the use of software developed and maintained by Oracle should involve Oracle technical support. Oracle, not HP, supports Oracle software. All ITRC HP-UX version 11.31 March 2009 patch bundles were applied to two HP Integrity rx3600-based Oracle RAC nodes. After this installation, the crsd.bin process from the Oracle RAC Clustering software seemed to hang at random intervals, causing a cluster check process launched by a third party to wait indefinitely and to eventually consume all available RAM memory. A situation where crsd.bin hangs may cause RAC to issue a Transfer of Control (TOC), that is, a system restart with crash dump. Solution If any patches related to Oracle RAC are included in a patch bundle (whether it is HP-UX specific patching that RAC depends upon or vice versa), RAC requires that the binaries be relinked prior to starting them. An alternative might be to (temporarily) uninstall the patch bundle. The relink of the binaries should be done by a DBA, they should know how to perform that action. In the current situation, relinking the Oracle RAC binaries resolved the issue. NOTE: It seems this problem can also be caused by the installation of HP patches PHKL_38762 or PHKL_39145. QXCR1000940361 tracked the problem. This has been fixed now, and the problem is not present anymore when installing the superseded patch PHKL_40208. The problem of QXCR1000940361 has existed before the fix of PHKL_39145 was introduced so we dont set a patch warning.
十、解决方案
问题定位完毕是因为缺少打HP patches PHKL_38762 or PHKL_39145的补丁导致
根据HP官网的建议,打HP patches PHKL_38762 or PHKL_39145解决这个问题,也可以打上PHKL_40208补丁问题解决
十一、补充说明
1、因为系统以前可以运行,那么我猜测这个bug应该有偶然性,让网友重启asm,然后启动数据库正常
2、在处理问题过程中,因缺少耐心,开始没有仔细阅读trace文件,导致问题分析过程中走了一些弯路
3、这个是我第一次发现因为系统补丁没有打,导致数据库异常