标签云
asm 恢复 asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 ORACLE恢复 Oracle 恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (100)
- 数据库 (1,597)
- DB2 (22)
- MySQL (70)
- Oracle (1,463)
- Data Guard (49)
- EXADATA (7)
- GoldenGate (21)
- ORA-xxxxx (158)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (13)
- ORACLE 21C (3)
- Oracle ASM (65)
- Oracle Bug (7)
- Oracle RAC (47)
- Oracle 安全 (6)
- Oracle 开发 (27)
- Oracle 监听 (27)
- Oracle备份恢复 (530)
- Oracle安装升级 (84)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (75)
- PostgreSQL (17)
- PostgreSQL恢复 (5)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (36)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (19)
-
最近发表
- Oracle 19c/21c最新patch信息-202404
- PostgreSQL恢复系列:pg_filedump批量处理
- PostgreSQL部分主要字典信息
- PostgreSQL恢复系列:pg_filedump恢复字典构造
- PostgreSQL 16 源码安装
- ORA-00742 ORA-00312 恢复
- 数据库open成功后报ORA-00353 ORA-00354错误引起的一系列问题(本质ntfs文件系统异常)
- ORA-600 ktsiseginfo1故障
- ORA-00600: internal error code, arguments: [16703], [1403], [4] 原因
- 最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视
- ORA-600 2662快速恢复之Patch scn工具
- TNS-12518: TNS:listener could not hand off client connection
- ora.storage无法启动报ORA-12514故障处理
- 断电引起文件scn异常数据库恢复
- ORA-16188: LOG_ARCHIVE_CONFIG settings inconsistent with previously started instance
- .[hudsonL@cock.li].mkp勒索加密数据库完美恢复
- 模拟带库实现rman远程备份
- 又一例:ORA-600 kclchkblk_4和2662故障
- Oracle误删除数据文件恢复
- Oracle 19C 备库DML重定向—DML Redirection
月归档:十月 2016
ORA-27300 ORA-27301 ORA-27302
有朋友库中alert日志报ORA-27300 ORA-27301 ORA-27302错误,让帮忙分析原因
ORA-27300 ORA-27301 ORA-27302
Fri Oct 28 22:15:33 2016 Process startup failed, error stack: Errors in file d:\app\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_psp0_4856.trc: ORA-27300: OS system dependent operation:CreateThread failed with status: 8 ORA-27301: OS failure message: 存储空间不足,无法处理此命令。 ORA-27302: failure occurred at: ssthrddcr Fri Oct 28 22:15:34 2016 Process J001 died, see its trace file kkjcre1p: unable to spawn jobq slave process Errors in file d:\app\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_cjq0_5664.trc: Fri Oct 28 22:16:35 2016 Process startup failed, error stack: Errors in file d:\app\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_psp0_4856.trc: ORA-27300: OS system dependent operation:CreateThread failed with status: 8 ORA-27301: OS failure message: 存储空间不足,无法处理此命令。 ORA-27302: failure occurred at: ssthrddcr Fri Oct 28 22:16:36 2016 Process J001 died, see its trace file kkjcre1p: unable to spawn jobq slave process Errors in file d:\app\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_cjq0_5664.trc: Fri Oct 28 22:17:36 2016 Process startup failed, error stack: Errors in file d:\app\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_psp0_4856.trc: ORA-27300: OS system dependent operation:CreateThread failed with status: 8 ORA-27301: OS failure message: 存储空间不足,无法处理此命令。 ORA-27302: failure occurred at: ssthrddcr
系统状态
数据库版本
数据库内存配置
新会话无法连接
28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54383)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54384)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54385)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54386)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54388)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54389)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:46:59 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54390)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误 28-10月-2016 22:47:00 * (CONNECT_DATA=(SID=xifenfei)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.91.5)(PORT=54391)) * establish * xifenfei * 12518 TNS-12518: TNS: 监听程序无法分发客户机连接 TNS-12560: TNS: 协议适配器错误
故障处理
HKEY_LOCAL_MACHINE SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters Value Name: IRPStackSize Data Type: DWORD The default value of the IRPStackSize parameter is 15. The range is from 11 (0xb hexadecimal) through 50 (0x32 hexadecimal).
参考:Ora-27300: OS System Dependent Operation:Createthread Failed With Status: 8 (Doc ID 1310453.1)
ORA-600 kcrfr_update_nab_2 故障恢复
由于控制器掉线导致数据库启动报ORA-600 kcrfr_update_nab_2错误,导致无法正常open
数据库版本信息
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0 vsnsql=14 vsnxtr=3 Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Windows Server 2003 Version V5.2 Service Pack 2 CPU : 12 - type 8664, 2 Physical Cores Process Affinity : 0x0000000000000000 Memory (Avail/Total): Ph:22579M/32754M, Ph+PgF:24594M/33845M
ORA-600 kcrfr_update_nab_2报错
Mon Oct 24 17:42:57 2016 Database mounted in Exclusive Mode Completed: ALTER DATABASE MOUNT Mon Oct 24 17:42:58 2016 ALTER DATABASE OPEN Mon Oct 24 17:43:14 2016 Beginning crash recovery of 1 threads parallel recovery started with 11 processes Mon Oct 24 17:43:14 2016 Started redo scan Mon Oct 24 17:43:16 2016 Errors in file d:\oracle\product\10.2.0\admin\spcsjkdb\udump\spcsjkdb_ora_10108.trc: ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], [] Mon Oct 24 17:43:18 2016 Aborting crash recovery due to error 600 Mon Oct 24 17:43:18 2016 Errors in file d:\oracle\product\10.2.0\admin\spcsjkdb\udump\spcsjkdb_ora_10108.trc: ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], [] ORA-600 signalled during: ALTER DATABASE OPEN...
trace文件信息
*** 2016-10-24 17:43:14.515 *** ACTION NAME:() 2016-10-24 17:43:14.515 *** MODULE NAME:(sqlplus.exe) 2016-10-24 17:43:14.515 *** SERVICE NAME:() 2016-10-24 17:43:14.515 *** SESSION ID:(356.3) 2016-10-24 17:43:14.515 Successfully allocated 11 recovery slaves Using 101 overflow buffers per recovery slave Thread 1 checkpoint: logseq 33251, block 2, scn 14624215134369 cache-low rba: logseq 33251, block 2463324 on-disk rba: logseq 33251, block 2803965, scn 14624216078841 start recovery at logseq 33251, block 2463324, scn 0 *** 2016-10-24 17:43:16.406 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7FFC22A2150], [2], [], [], [], [], [] Current SQL statement for this session: ALTER DATABASE OPEN ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp+663 CALL??? ksedst+55 003C878B8 000000000 012B863E8 000000000 ksfdmp+19 CALL??? ksedmp+663 000000003 015572A70 007222698 003CACC80 kgerinv+158 CALL??? ksfdmp+19 015572430 000000000 0FFFFFFFF 000000000 kgeasnmierr+62 CALL??? kgerinv+158 000000000 000000000 000000000 004FD788F kcrfr_update_nab+18 CALL??? kgeasnmierr+62 00BDA1170 000000000 000000000 6 000000002 kcrfr_read+1078 CALL??? kcrfr_update_nab+18 007222698 00001E650 015572430 6 0072229B8 kcrfrgv+8134 CALL??? kcrfr_read+1078 000000000 0051525D7 000000000 0051525D7 kcratr1+488 CALL??? kcrfrgv+8134 007222698 000000000 000000000 000000000 kcratr+412 CALL??? kcratr1+488 012B891C8 012B890A4 00727FFB8 00BEA7FF0 kctrec+1910 CALL??? kcratr+412 012B891C8 012B91E18 000000000 012B91E48 kcvcrv+3585 CALL??? kctrec+1910 012B92C58 000000000 00726DF00 00726BDB0 kcfopd+1007 CALL??? kcvcrv+3585 012B93350 000000000 000000000 000000000 adbdrv+55820 CALL??? kcfopd+1007 000000000 000000000 000000000 000000000 opiexe+13897 CALL??? adbdrv+55820 000000023 000000003 000000102 000000000 opiosq0+3558 CALL??? opiexe+13897 000000004 000000000 012B9B238 4155474E414C5F45 kpooprx+339 CALL??? opiosq0+3558 000000003 00000000E 012B9B3C8 0000000A4 kpoal8+894 CALL??? kpooprx+339 015587550 000000018 0041AE700 000000001 opiodr+1136 CALL??? kpoal8+894 00000005E 000000017 012B9E868 0072F5100 ttcpip+5146 CALL??? opiodr+1136 00000005E 000000017 012B9E868 2D8C00000000 opitsk+1818 CALL??? ttcpip+5146 015587550 000000000 000000000 000000000 opiino+1129 CALL??? opitsk+1818 00000001E 000000000 000000000 000000000 opiodr+1136 CALL??? opiino+1129 00000003C 000000004 012B9FB20 000000000 opidrv+815 CALL??? opiodr+1136 00000003C 000000004 012B9FB20 000000000 sou2o+52 CALL??? opidrv+815 00000003C 000000004 012B9FB20 7FF7FC48580 opimai_real+131 CALL??? sou2o+52 000000000 012B9FC40 7FFFFF7F258 077EF4D1C opimai+96 CALL??? opimai_real+131 7FF7FC48580 7FFFFF7E000 0001F0003 000000000 OracleThreadStart+6 CALL??? opimai+96 012B9FEF0 01289FF3C 012B9FCC0 40 7FF7FC48580 0000000077D6B6DA CALL??? OracleThreadStart+6 01289FF3C 000000000 000000000 40 012B9FFA8
官方描述
The assert ORA-600: [kcrfr_update_nab_2] is a direct result of a lost write in the current on line log that we are attempting to resolve.So, this confirms the theory that this is a OS/hardware lost write issue not an internal oracle bug. In fact the assert ORA-600: [kcrfr_update_nab_2] is how we detect a lost log write.
Bug 5692594
Hdr: 5692594 10.2.0.1 RDBMS 10.2.0.1 RECOVERY PRODID-5 PORTID-226 ORA-600
Abstract: AFTER DATABASE CRASHED DOESN’T OPEN ORA-600 [KCRFR_UPDATE_NAB_2]
Status: 95,Closed, Vendor OS Problem
Bug 6655116
Hdr: 6655116 10.2.0.3 RDBMS 10.2.0.3 RECOVERY PRODID-5 PORTID-23
Abstract: INSTANCES CRASH WITH ORA-600 [KCRFR_UPDATE_NAB_2] AFTER DISK FAILURE
根据官方的描述,结合故障情况,基本上可以确定是由于硬件异常导致Oracle写丢失,从而除非oracle相关bug导致数据库无法正常启动
ORA-600 [kcrfr_update_nab_2] [a] [b] VERSIONS: versions 10.2 to 11.1 DESCRIPTION: Failure of upgrade of recovery node (RN) enqueue to SSX mode ARGUMENTS: Arg [a] State Object for redo nab enqueue for resilvering Arg [b] Redo nab enqueue mode FUNCTIONALITY: Kernel Cache Redo File Read IMPACT: INSTANCE FAILURE
处理方法
1.如果有备份,利用备份进行不完全恢复,跳过最后异常的redo,数据库resetlogs打开
2.如果没有备份,尝试使用历史的控制文件进行不完全恢复,或者直接跳过数据库一致性打开库.
3.互联网有人解决删除redo第二组成员数据库open成功(http://blog.itpub.net/16976507/viewspace-1266952/)
存储精简卷导致asm磁盘组异常
有朋友在一个存储空间给asm使用,发生空间不足,然后使用另外一个存储中的lun给asm的data磁盘组增加asm disk,运行了大概1天之后,asm磁盘组直接dismount,数据库crash.然后就无法正常mount.包括这个存储上的几个其他磁盘组也无法正常mount.
数据库异常日志
Sun Oct 23 08:43:59 2016 SUCCESS: diskgroup DATA was dismounted SUCCESS: diskgroup DATA was dismounted Sun Oct 23 08:44:00 2016 Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lmon_79128.trc: ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Sun Oct 23 08:44:00 2016 Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00345: redo log write error block 15924 count 2 ORA-00312: online log 2 thread 1: '+DATA/orcl/onlinelog/group_2.274.892363167' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00204: error in reading (block 1, # blocks 1) of control file ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Sun Oct 23 08:44:00 2016 LGWR (ospid: 79174): terminating the instance due to error 204 Sun Oct 23 08:44:00 2016 opiodr aborting process unknown ospid (79742) as a result of ORA-1092 Sun Oct 23 08:44:01 2016 ORA-1092 : opitsk aborting process Sun Oct 23 08:44:01 2016 ORA-1092 : opitsk aborting process System state dump requested by (instance=1, osid=79174 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_79118.trc Instance terminated by LGWR, pid = 79174
很明显,数据库异常是由于asm diskgroup dismount,因此分析asm 日志
asm 日志
Sun Oct 23 07:00:31 2016 Time drift detected. Please check VKTM trace file for more details. Sun Oct 23 08:43:55 2016 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 1048576 WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576 ERROR: failed to copy file +DATA.524, extent 15030 ERROR: ORA-15080 thrown in ARB0 for group number 1 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc: ORA-15080: synchronous I/O operation to a disk failed Sun Oct 23 08:43:55 2016 NOTE: stopping process ARB0 NOTE: rebalance interrupted for group 1/0xec689cdd (DATA) NOTE: ASM did background COD recovery for group 1/0xec689cdd (DATA) NOTE: starting rebalance of group 1/0xec689cdd (DATA) at power 1 Starting background process ARB0 Sun Oct 23 08:43:56 2016 ARB0 started with pid=24, OS id=103554 NOTE: assigning ARB0 to group 1/0xec689cdd (DATA) with 1 parallel I/O Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 1048576 WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576 ERROR: failed to copy file +DATA.256, extent 6570 ERROR: ORA-15080 thrown in ARB0 for group number 1 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc: ORA-15080: synchronous I/O operation to a disk failed NOTE: stopping process ARB0 Sun Oct 23 08:43:58 2016 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_dbw0_8521.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 4096 WARNING: Write Failed. group:1 disk:3 AU:6789 offset:24576 size:4096 NOTE: cache initiating offline of disk 3 group DATA NOTE: process _dbw0_+asm1 (8521) initiating offline of disk 3.3915934787 (DATA_0003) with mask 0x7e in group 1 Sun Oct 23 08:43:58 2016 WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f is now being offlined WARNING: Disk 3 (DATA_0003) in group 1 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: initiating PST update: grp = 1, dsk = 3/0xe9686c43, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 14 for pid 14, osid 8521 ERROR: Disk 3 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Sun Oct 23 08:43:58 2016 NOTE: cache dismounting (not clean) group 1/0xEC689CDD (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 103577, image: oracle@node1 (B000) WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f offline is being aborted WARNING: Offline of disk 3 (DATA_0003) in group 1 and mode 0x7f failed on ASM inst 1 NOTE: halting all I/Os to diskgroup 1 (DATA) Sun Oct 23 08:43:59 2016 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=160.10145 last written ABA 160.10145
错误信息很明显,由于Write Failed导致asm diskgroup dismount.
系统日志
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdd, sector 12467058681 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681 Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] 33Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdb, sector 12467056640 Oct 23 08:43:55 node1 kernel: f9 00 00 04 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640 Oct 23 08:43:55 node1 kernel: 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467057657 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467057657 Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467058681 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681 Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdj, sector 12467056640 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640 Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 33 f9 00 00 04 00 00 00 Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Add. Sense: Space allocation failed write protect Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 03 3b 7e 78 30 00 00 00 08 00 00 Oct 23 10:50:59 node1 init: oracle-ohasd main process (6150) killed by TERM signal
错误信息为:critical space allocation error,严重空间分配错误.也就是linux在分配空间之时发生错误.在换而言之,由于分配空间错误导致asm 磁盘组dismount.
查看多路径信息
[root@node1 ~]# multipath -ll 36000d31003190c000000000000000003 dm-3 COMPELNT,Compellent Vol size=80T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:9:1 sdd 8:48 active ready running `- 8:0:9:1 sdi 8:128 active ready running delldisk2 (36000d310031908000000000000000003) dm-4 COMPELNT,Compellent Vol size=8.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:12:1 sde 8:64 active ready running |- 8:0:6:1 sdh 8:112 active ready running |- 6:0:4:1 sdb 8:16 active ready running `- 8:0:12:1 sdj 8:144 active ready running delldisk1 (36000d31003190a000000000000000007) dm-2 COMPELNT,Compellent Vol size=12T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:1:1 sda 8:0 active ready running |- 8:0:2:1 sdf 8:80 active ready running |- 6:0:7:1 sdc 8:32 active ready running `- 8:0:3:1 sdg 8:96 active ready running
很明显报错的都是同一个lun(delldisk2),也就是存储空间使用完的存储.也就是说,由于delldisk2存储的空间使用尽了导致系统出现分配空间错误,从而导致asm 写失败,进而导致数据库异常.这种问题的本质其实就是存储给系统分配了8T,但是实际存储可以使用的空间不足8T,而os按照8T来使用从而出现该问题.专业名字叫做”存储精简卷”.因此各位在存储配置之时需要注意该问题.因为这种情况的出现一般只是写io异常,读依旧正常,因此不会丢失数据.