标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,767)
- DB2 (22)
- MySQL (77)
- Oracle (1,608)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (166)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (29)
- Oracle备份恢复 (590)
- Oracle安装升级 (97)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (86)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (32)
- SQL Server恢复 (13)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (39)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (22)
-
最近发表
- ORA-00756 ORA-10567故障数据0丢失恢复
- 数据库文件变成32k故障恢复
- tcp连接过多导致监听TNS-12532 TNS-12560 TNS-00502错误
- 文件系统格式化MySQL数据库恢复
- .sstop勒索加密数据库恢复
- 解决一次硬件恢复之后数据文件0kb的故障恢复case
- Error in invoking target ‘libasmclntsh19.ohso libasmperl19.ohso client_sharedlib’问题处理
- ORA-01171: datafile N going offline due to error advancing checkpoint
- linux环境oracle数据库被文件系统勒索加密为.babyk扩展名溯源
- ORA-600 ksvworkmsgalloc: bad reaper
- ORA-600 krccfl_chunk故障处理
- Oracle Recovery Tools恢复案例总结—202505
- ORA-600 kddummy_blkchk 数据库循环重启
- 记录一次asm disk加入到vg通过恢复直接open库的案例
- CHECKDB 发现了 N 个分配错误和 M 个一致性错误
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
- [MY-013183] [InnoDB] Assertion failure故障处理
分类目录归档:Oracle ASM
监控asm disk磁盘性能
使用ASM的朋友估计都有一个困惑,ASM就是一个黑盒子,怎么才能够做到类似如裸设备或者文件系统一样,通过系统的命令(iostat)来监控其磁盘IO的运行性能.其实ORACLE在设计ASM的过程中,也就考虑到了这个需求,把磁盘相关的情况都记录到了ASM相关视图中v$asm_disk和v$asm_disk_stat(这两个视图功能相同,只是查询v$asm_disk需要每次访问磁盘头获取数据,v$asm_disk_stat是磁盘头存储在内存中的数据,查询v$asm_disk_stat对磁盘影响非常小),所以我们可以通过查询v$asm_disk_stat中的数据,然后做减法就可以获得asm disk某个时间段的磁盘io性能情况.ORACLE提供了相关工具叫做asmiostat用来监控,具体可以参考ASMIOSTAT Script to collect iostats for ASM disks [ID 437996.1]
确保TIMED_STATISTICS=TRUE
虽然是默认值,多检查无错,因为到该值为false之时READ_TIME/WRITE_TIME为0
[grid@xifenfei tmp]$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.1.0 Production on Fri Feb 1 08:29:01 2013 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production With the Automatic Storage Management option SQL> show parameter TIMED_STATISTICS NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ timed_statistics boolean TRUE
asmiostat使用
[grid@xifenfei tmp]$ ./asmiostat.sh help=y Invalid parameter: <interval> must be > 0; <count> must be >= 0 ./asmiostat.sh [-s ASM ORACLE_SID] [-h ASM ORACLE_HOME] [-g diskgroup] [<interval>] [<count>] Output: DiskPath - Path to ASM disk DiskName - ASM disk name Gr - ASM disk group number Dsk - ASM disk number Reads - Reads Writes - Writes AvRdTm - Average read time (in msec) AvWrTm - Average write time (in msec) KBRd - Kilobytes read KBWr - Kilobytes written AvRdSz - Average read size (in bytes) AvWrSz - Average write size (in bytes) RdEr - Read errors WrEr - Write errors
相关值说明
DiskPath - Path to ASM disk DiskName - ASM disk name Gr - ASM disk group number Dsk - ASM disk number Reads - 指定时间内I/O读请求次数 Writes - 指定时间内I/O写请求次数 AvRdTm - 平均每次I/O读请求所需时间 (in msec) AvWrTm - 平均每次I/O写请求所需时间 (in msec) KBRd - 指定时间内读操作的量(KB) KBWr - 指定时间内写操作的量(KB) AvRdSz - 平均每次I/O读请求得到的数据量(B) AvWrSz - 平均每次I/O写请求得到的数据量(B) RdEr - 指定时间内I/O读请求错误次数 WrEr - 指定时间内I/O写请求错误次数
asmiostat效果展示
[grid@xifenfei tmp]$ ./asmiostat.sh -s $ORACLE_SID -h $ORACLE_HOME -g DATA 1 3 Date: Fri Feb 1 08:31:45 CST 2013 Interval: 1 secs Disk Group: DATA DiskPath - DiskName Gr Dsk Reads Writes AvRdTm AvWrTm KBRd KBWr AvRdSz AvWrSz RdEr WrEr /dev/sdb - DATA_0000 1 0 0 0 0.0 0.0 0 0 0 0 0 0 Date: Fri Feb 1 08:31:47 CST 2013 Interval: 1 secs Disk Group: DATA DiskPath - DiskName Gr Dsk Reads Writes AvRdTm AvWrTm KBRd KBWr AvRdSz AvWrSz RdEr WrEr /dev/sdb - DATA_0000 1 0 4 3 0.6 1006.1 0 0 0 0 0 0 Date: Fri Feb 1 08:31:49 CST 2013 Interval: 1 secs Disk Group: DATA DiskPath - DiskName Gr Dsk Reads Writes AvRdTm AvWrTm KBRd KBWr AvRdSz AvWrSz RdEr WrEr /dev/sdb - DATA_0000 1 0 8 2 1.3 1.5 0 0 0 0 0 0
发表在 Oracle ASM
4 条评论
ASM中磁盘组权限设置
aix平台11gr2单库使用使用grid和oracle用户分别部署gi和db,在添加磁盘的时候,使用设置磁盘所属用户和组为grid与oinstall,设置权限为755.添加磁盘成功后,数据库直接crash.
asm添加磁盘操作
SQL> alter diskgroup DATA add disk '/dev/rhdisk15' NOTE: Assigning number (2,7) to disk (/dev/rhdisk15) NOTE: requesting all-instance membership refresh for group=2 NOTE: initializing header on grp 2 disk DATA_0007 NOTE: requesting all-instance disk validation for group=2 Wed Apr 03 22:09:03 2013 NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance. NOTE: requesting all-instance disk validation for group=2 NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance. NOTE: initiating PST update: grp = 2 Wed Apr 03 22:09:03 2013 GMON updating group 2 at 21 for pid 17, osid 22610284 NOTE: PST update grp = 2 completed successfully NOTE: membership refresh pending for group 2/0xa026f7ec (DATA) GMON querying group 2 at 22 for pid 13, osid 20643916 NOTE: cache opening disk 7 of grp 2: DWDATAGRP_0007 path:/dev/rhdisk15 GMON querying group 2 at 23 for pid 13, osid 20643916 SUCCESS: refreshed membership for 2/0xa026f7ec (DATA) NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1 SUCCESS: alter diskgroup DATA add disk '/dev/rhdisk15' Starting background process ARB0 Wed Apr 03 22:09:07 2013 ARB0 started with pid=22, OS id=14155890 NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA Wed Apr 03 22:09:19 2013 SQL> alter diskgroup DATA add disk '/dev/rhdisk11' Wed Apr 03 22:09:20 2013 NOTE: stopping process ARB0 NOTE: rebalance interrupted for group 2/0xa026f7ec (DATA) NOTE: Assigning number (2,8) to disk (/dev/rhdisk11) NOTE: requesting all-instance membership refresh for group=2 NOTE: initializing header on grp 2 disk DATA_0008 NOTE: requesting all-instance disk validation for group=2 NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance. NOTE: requesting all-instance disk validation for group=2 NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance. NOTE: initiating PST update: grp = 2 Wed Apr 03 22:09:23 2013 GMON updating group 2 at 24 for pid 17, osid 22610284 NOTE: PST update grp = 2 completed successfully NOTE: membership refresh pending for group 2/0xa026f7ec (DATA) GMON querying group 2 at 25 for pid 13, osid 20643916 NOTE: cache opening disk 8 of grp 2: DATA_0008 path:/dev/rhdisk11 GMON querying group 2 at 26 for pid 13, osid 20643916 SUCCESS: refreshed membership for 2/0xa026f7ec (DATA) NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1 SUCCESS: alter diskgroup DATA add disk '/dev/rhdisk11' Starting background process ARB0 Wed Apr 03 22:09:26 2013 ARB0 started with pid=22, OS id=22872116 NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA Wed Apr 03 22:14:41 2013 NOTE: ASM client xifenfei:xifenfei disconnected unexpectedly. NOTE: check client alert log. NOTE: Trace records dumped in trace file /u01/diag/asm/+asm/+ASM/trace/+ASM_ora_15073468.trc Wed Apr 03 22:16:53 2013 NOTE: client xifenfei:xifenfei registered, osid 20709378, mbr 0x0 Wed Apr 03 22:20:33 2013 NOTE: client xifenfei:xifenfei deregistered
这里可看到增加磁盘操作正常并且开始做rebalance,但是也看到关于client xifenfei异常断开连接(本质就是数据库crash)
crash时的alert日志
Wed Apr 03 22:00:00 2013 Setting Resource Manager plan SCHEDULER[0x318B]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Wed Apr 03 22:00:00 2013 Starting background process VKRM Wed Apr 03 22:00:00 2013 VKRM started with pid=31, OS id=22413426 Wed Apr 03 22:09:06 2013 ORA-15025: could not open disk "/dev/rhdisk15" ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 11 Wed Apr 03 22:09:06 2013 SUCCESS: disk DATA_0007 (7.2092304189) added to diskgroup DATA Wed Apr 03 22:09:26 2013 ORA-15025: could not open disk "/dev/rhdisk15" ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 11 Wed Apr 03 22:09:26 2013 SUCCESS: disk DATA_0008 (8.2092304190) added to diskgroup DATA Wed Apr 03 22:14:40 2013 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 1 of virtual extent 1 logical extent 0 of file 261 in group 2 on disk 7 allocation unit 464 KCF: read, write or open error, block=0x6a online=1 file=1 '+DATA/xifenfei/datafile/system.261.788373447' error=15081 txt: '' Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc: Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc: ORA-63999: data file suffered media failure ORA-01114: IO error writing block to file 1 (block # 106) ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447' ORA-15081: failed to submit an I/O operation to a disk ORA-15081: failed to submit an I/O operation to a disk DBW0 (ospid: 17367438): terminating the instance due to error 63999
这里可以看到数据库异常crash是因为/dev/rhdisk15没有权限去操作该文件,导致dbw0进程异常,从而出现该数据库crash
尝试重启数据库(asm重启正常)
SQL> startup ORACLE instance started. Total System Global Area 1.2827E+10 bytes Fixed Size 2233480 bytes Variable Size 1711278968 bytes Database Buffers 1.1073E+10 bytes Redo Buffers 40894464 bytes Database mounted. ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447'
这里提示file 1需要恢复,查看alert日志,出现以下错误
Completed: ALTER DATABASE MOUNT Wed Apr 03 22:17:02 2013 ALTER DATABASE OPEN Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 3 Additional information: 4 Additional information: 4194306 WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 261 in group 2 on disk 8 allocation unit 462 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 3 Additional information: 4 Additional information: 4194306 WARNING: Write Failed. group:2 disk:8 AU:690 offset:16384 size:16384 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 3 Additional information: 4 Additional information: 4194306 WARNING: Write Failed. group:2 disk:8 AU:918 offset:16384 size:16384 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 263 in group 2 on disk 8 allocation unit 918 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 262 in group 2 on disk 8 allocation unit 690 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-01110: data file 3: '+DATA/xifenfei/datafile/undotbs1.263.788373475' ORA-01114: IO error writing block to file 3 (block # 1) ORA-15081: failed to submit an I/O operation to a disk ORA-15081: failed to submit an I/O operation to a disk Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc: ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.262.788373463' ORA-01114: IO error writing block to file 2 (block # 1) ORA-15081: failed to submit an I/O operation to a disk ORA-15081: failed to submit an I/O operation to a disk
recover database 操作
SQL> recover database; ORA-00283: recovery session canceled due to errors ORA-01201: file 1 header failed to write correctly
Wed Apr 03 22:18:49 2013 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Wed Apr 03 22:18:50 2013 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc: ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 3 Additional information: 4 Additional information: 4194306 WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc: ORA-27041: unable to open file IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 3 Additional information: 4 Additional information: 4194306
依然是这里的提示依然是因为磁盘无读写权限从而出现数据库无法写数据文件问题,修改刚刚加入的磁盘文件权限问为660(4读2写1执行),表明与oinstall相同组的oracle用户对该磁盘也有读写权限.
这个事故是一个很简单,而且随着11g中asm使用grid和oracle用户的客户越来越多,相关的事故也越来越多,因为大多数使用人习惯直接给某个文件授权为755,而在这样的grid和oracle分开安装的系统中,将出现增加磁盘后,数据库crash,而且不能起来(因为oracle用户对磁盘只有读权限,无写权限),一种比较好的规范:在11gr2的asm系统中(grid和oracle用户),建议设置磁盘为grid.oinstall,权限设置为660
发表在 Oracle ASM
2 条评论
asmlib异常报ORA-00600[kfklLibFetchNext00]
一个朋友的历史库出现故障,在linux 4的平台上asm的10.2.0.1的单库,asm使用asmlib来处理。
asm不能正常mount磁盘组,可以看到asmdisk,alert日志报ORA-00600[kfklLibFetchNext00]
操作系统内核是:2.6.9-78
oracleasmlib是:2.0.2-1
asm磁盘组mount失败
--以前故障 SQL> ALTER DISKGROUP ALL MOUNT Thu Sep 6 14:23:16 2012 NOTE: cache registered group DGARC number=1 incarn=0x2bf96274 NOTE: cache registered group DGDATA number=2 incarn=0x2c196275 NOTE: cache registered group DGSYS number=3 incarn=0x2c196276 Thu Sep 6 14:23:16 2012 Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc: ORA-15183: ASMLIB initialization error [driver/agent not installed] Thu Sep 6 14:23:16 2012 Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc: ORA-15183: ASMLIB initialization error [/opt/oracle/extapi/64/asm/orcl/1/libasm.so] ORA-15183: ASMLIB initialization error [driver/agent not installed] Thu Sep 6 14:23:16 2012 ERROR: no PST quorum in group 1: required 2, found 0 Thu Sep 6 14:23:16 2012 NOTE: cache dismounting group 1/0x2BF96274 (DGARC) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGARC was not mounted Thu Sep 6 14:23:16 2012 ERROR: no PST quorum in group 2: required 2, found 0 Thu Sep 6 14:23:16 2012 NOTE: cache dismounting group 2/0x2C196275 (DGDATA) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGDATA was not mounted Thu Sep 6 14:23:16 2012 ERROR: no PST quorum in group 3: required 2, found 0 Thu Sep 6 14:23:16 2012 NOTE: cache dismounting group 3/0x2C196276 (DGSYS) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGSYS was not mounted --现在故障 Thu Jan 24 13:49:45 2013 SQL> ALTER DISKGROUP ALL MOUNT Thu Jan 24 13:49:45 2013 NOTE: cache registered group DGARC number=1 incarn=0xf388cee9 NOTE: cache registered group DGDATA number=2 incarn=0xf3a8ceea NOTE: cache registered group DGSYS number=3 incarn=0xf3a8ceeb Thu Jan 24 13:49:45 2013 Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc: ORA-00600: internal error code, arguments: [kfklLibFetchNext00], [18446744073709551614], [0], [], [], [], [], [] Thu Jan 24 13:49:46 2013 Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc: ORA-00600: internal error code, arguments: [kfklLibFetchNext00], [18446744073709551614], [0], [], [], [], [], [] Thu Jan 24 13:49:46 2013 ERROR: no PST quorum in group 1: required 2, found 0 Thu Jan 24 13:49:46 2013 NOTE: cache dismounting group 1/0xF388CEE9 (DGARC) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGARC was not mounted Thu Jan 24 13:49:46 2013 ERROR: no PST quorum in group 2: required 2, found 0 Thu Jan 24 13:49:46 2013 NOTE: cache dismounting group 2/0xF3A8CEEA (DGDATA) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGDATA was not mounted Thu Jan 24 13:49:46 2013 ERROR: no PST quorum in group 3: required 2, found 0 Thu Jan 24 13:49:46 2013 NOTE: cache dismounting group 3/0xF3A8CEEB (DGSYS) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DGSYS was not mounted Shutting down instance: further logons disabled
trace文件信息
----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? 000000001 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? 000000001 ? ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? 000000001 ? kgerinv()+161 call ksfdmp() 000000003 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? 000000001 ? kgesinv()+33 call kgerinv() 006469D40 ? 0064E1C58 ? 000000000 ? 000000000 ? 000000001 ? 000000001 ? kgesinw()+166 call kgesinv() 006469D40 ? 0064E1C58 ? 000000000 ? 000000000 ? 000000001 ? 000000001 ? kfklLibScanNext()+2 call kgesinw() 006469D40 ? 000000000 ? 39 000000001 ? 000000000 ? FFFFFFFFFFFFFFFE ? 000000000 ? kfkLibFetchNext()+3 call kfklLibScanNext() 0064DDD70 ? 7FBFFFDCD0 ? 43 000000001 ? 000000000 ? FFFFFFFFFFFFFFFE ? 000000000 ? kfuitrnInit()+524 call kfkLibFetchNext() 006469D40 ? 2A971DFF90 ? 000000001 ? 000000000 ? FFFFFFFFFFFFFFFE ? 000000000 ? kfkLibIterInit()+18 call kfuitrnInit() 006469D40 ? 2A971DFCB0 ? 0 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? kfkLoadAllLibs()+36 call kfkLibIterInit() 000000000 ? 00646C7E0 ? 3 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? kfkDiscoverString() call kfkLoadAllLibs() 000000000 ? 00646C7E0 ? +107 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? Cannot find symbol Cannot find symbol Cannot find symbol kfdDiscoverString() call kfkDiscoverString() 067A53768 ? 00646C7E0 ? +28 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? kfdDiscoverShallow( call kfdDiscoverString() 067A53768 ? 000000000 ? )+315 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? kfgbDriver()+1174 call kfdDiscoverShallow( 000000180 ? 000000000 ? ) 2A971DFF90 ? 000000009 ? 000000009 ? 000000000 ? ksbabs()+564 call kfgbDriver() 7FBFFFE5C0 ? 000000048 ? 000000000 ? 000000009 ? 000000009 ? 000000000 ? ksbrdp()+727 call ksbabs() 7FBFFFE5C0 ? 000000048 ? 000000000 ? 000000009 ? 000000009 ? 000000000 ? opirip()+616 call ksbrdp() 7FBFFFE5C0 ? 000000048 ? 000000001 ? 06002C770 ? 000000009 ? 000000000 ? opidrv()+582 call opirip() 000000032 ? 000000004 ? 7FBFFFF6C8 ? 06002C770 ? 000000009 ? 000000000 ? sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7FBFFFF6C8 ? 06002C770 ? 000000009 ? 000000000 ? opimai_real()+317 call sou2o() 7FBFFFF6A0 ? 000000032 ? 000000004 ? 7FBFFFF6C8 ? 000000009 ? 000000000 ? main()+116 call opimai_real() 000000003 ? 7FBFFFF730 ? 000000004 ? 7FBFFFF6C8 ? 000000009 ? 000000000 ? <0x3c9fb1c40b> call main() 000000003 ? 7FBFFFF730 ? 000000004 ? 7FBFFFF6C8 ? 000000009 ? 000000000 ? --------------------- Binary Stack Dump ---------------------
因为客户的库是一个历史库,基本上不怎么使用,在2012年启动asm就出现了ORA-15183错误,然后在2013年重启机器后,再次启动asm就出现了ORA-00600[kfklLibFetchNext00]错误,通过2012年的错误提示,我们大概可以判断出来该问题和ASMLIB有关系,查询mos发现429945.1,发现Call Stack Trace完全一致,可以定位是该问题(如果想深入分析,可以通过strace继续分析)
ORA-600: [kfklLibFetchNext00], [18446744073709551614], [0] when mounting diskgroup in ASM
Applies to: Linux OS - Version: 2.0.1-1 and later [Release: RHEL4 and later ] Information in this document applies to any platform. Linux Kernel - Version: 2.0.1 Symptoms 3 RAC db. 2 nodes are up and functioning except for 1 node - ASM did not come back up after the reboot eventhough all disks show available from asmlib's perspective: Changes All that was done with resources were stopped on Node1 and an extra LUN added. A reboot was then performed. Cause The cause of the issue is libasm.o corruption Ran the following to confirm that disks are ok: /dev/oracleasm listdisks /usr/sbin/asmtool -I -l /dev/oracleasm -n /dev/sdg1 -a label /usr/sbin/oracleasm-discover 'ORCL:*' dd if=/dev/sdg1 bs=8192 count=1 | od -c ==> output checked out fine . kfod asm_diskstring='ORCL:*' ==> this failed on Node1 KFOD-00600: file not found; argument [610][kfklLibFetchNext00] even though libasm.o exists You might see the following call stack as well ----- Call Stack Trace ----- kfklLibScanNext kfkLibFetchNext kfuitrnInit kfkLibIterInit kfkLoadAllLibs kfkDiscoverString kfdDiscoverString kfdDiscoverShallow kfgbDriver strace showed Node1-failing ------- stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0777, st_size=19344, ...}) = 0 getdents64(4, /* 0 entries */, 4096) = 0 <<<< close(4) = 0 open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1 ENOENT (No such file or directory) open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9750d000 write(1, "KFOD-00600: file not found; argu"..., 69) = 69 Node2-working ----- stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0 open("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", O_RDONLY) = 4 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\23\0"..., 832) = 832 fstat(4, {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0 mmap(NULL, 1066104, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) 0x2a9750d000
通过MOS的描述,可以明确定位到问题是:libasm.o异常导致
解决方案
To implement the solution, reinstall the ASMlib RPM >rpm -Uvh oracleasmlib-2.0.0-1 This replaces the /opt/oracle/extapi/64/asm/orcl/1/libasm.so