标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (102)
- 数据库 (1,659)
- DB2 (22)
- MySQL (72)
- Oracle (1,522)
- Data Guard (51)
- EXADATA (8)
- GoldenGate (21)
- ORA-xxxxx (158)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (14)
- ORACLE 21C (3)
- Oracle 23ai (7)
- Oracle ASM (65)
- Oracle Bug (8)
- Oracle RAC (52)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (554)
- Oracle安装升级 (90)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (77)
- PostgreSQL (18)
- PostgreSQL恢复 (6)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (37)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (20)
-
最近发表
- ORA-600 16703故障再现
- 数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)
- .[metro777@cock.li].Elbie勒索病毒加密数据库恢复
- 应用连接错误,初始化mysql数据库恢复
- RAC默认服务配置优先节点
- Oracle 19c RAC 替换私网操作
- 监听报TNS-12541 TNS-12560 TNS-00511错误
- drop tablespace xxx including contents恢复
- Linux 8 修改网卡名称
- 如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)
- 如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)
- ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt] 相关bug
- ORA-600 krhpfh_03-1210故障处理
- 19c库启动报ORA-600 kcbzib_kcrsds_1
- DBMS_SESSION.set_context提示ORA-01031问题解决
- redo写丢失导致ORA-600 kcrf_resilver_log_1故障
- 硬件故障导致ORA-01242 ORA-01122等错误
- 200T 数据库非归档无备份恢复
- 利用flashback快速恢复failover 的备库
- [comingback2022@cock.li].eking和[tsai.shen@mailfence.com].faust扩展名勒索病毒数据库可以完美恢复
标签归档:ORA-00345
硬件故障数据库异常恢复
硬件故障数据库crash
有客户由于硬件故障导致数据库异常ORA-00345 ORA-00312 ORA-27070 OSD-04016
Tue Feb 05 16:58:26 2019 Thread 1 advanced to log sequence 17139 (LGWR switch) Current log# 12 seq# 17139 mem# 0: S:\ORADATA\ORCL\REDO12A.LOG Current log# 12 seq# 17139 mem# 1: S:\ORADATA\ORCL\REDO12B.LOG Tue Feb 05 19:47:24 2019 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_lgwr_2420.trc: ORA-00345: redo log write error block 152097 count 8 ORA-00312: online log 12 thread 1: 'S:\ORADATA\ORCL\REDO12A.LOG' ORA-27070: async read/write failed OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 ORA-00345: redo log write error block 152097 count 8 ORA-00312: online log 12 thread 1: 'S:\ORADATA\ORCL\REDO12B.LOG' ORA-27070: async read/write failed OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。 ORA-00345: redo log write error block 152105 count 1 ORA-00312: online log 12 thread 1: 'S:\ORADATA\ORCL\REDO12A.LOG' ORA-27070: async read/write failed OSD-04016: 异步 I/O 请求排队时出错。 O/S-Error: (OS 1) 函数不正确。
直接启动数据库报错
修复好硬件之后,直接启动数据库报ORA-00600 kcratr_scan_lastbwr错误
Fri Feb 08 20:58:15 2019 alter database mount exclusive Successful mount of redo thread 1, with mount id 1527506791 Database mounted in Exclusive Mode Lost write protection disabled Completed: alter database mount exclusive alter database open Beginning crash recovery of 1 threads Started redo scan Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3672.trc (incident=41353): ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_41353\orcl_ora_3672_i41353.trc Aborting crash recovery due to error 600 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3672.trc: ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3672.trc: ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] ORA-600 signalled during: alter database open... Fri Feb 08 20:58:24 2019 Trace dumping is performing id=[cdmp_20190208205824] Fri Feb 08 20:59:04 2019 alter database open Beginning crash recovery of 1 threads Started redo scan Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_1696.trc (incident=41354): ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_41354\orcl_ora_1696_i41354.trc Aborting crash recovery due to error 600 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_1696.trc: ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_1696.trc: ORA-00600: 内部错误代码, 参数: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], [] ORA-600 signalled during: alter database open ...
recover database报错
执行recover database报错ORA-00600 6101,ORA-00600 kdourp_inorder2,ORA-00600 ktbsdp1,ORA-00600 3020
Fri Feb 08 21:09:20 2019 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Parallel Media Recovery started with 4 slaves Fri Feb 08 21:09:21 2019 Recovery of Online Redo Log: Thread 1 Group 12 Seq 17139 Reading mem 0 Mem# 0: S:\ORADATA\ORCL\REDO12A.LOG Mem# 1: S:\ORADATA\ORCL\REDO12B.LOG Fri Feb 08 21:09:21 2019 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr02_3780.trc (incident=49379): ORA-00600: internal error code, arguments: [6101], [17], [21], [0], [], [], [], [], [], [], [], [] Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_49379\orcl_pr02_3780_i49379.trc Fri Feb 08 21:09:21 2019 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr01_2040.trc (incident=49371): ORA-00600: internal error code, arguments: [kdourp_inorder2], [34], [0], [0], [44], [], [], [], [], [], [], [] Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_49371\orcl_pr01_2040_i49371.trc Fri Feb 08 21:09:21 2019 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr03_1068.trc (incident=49387): ORA-00600: internal error code, arguments: [ktbsdp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_49387\orcl_pr03_1068_i49387.trc Fri Feb 08 21:09:24 2019 Trace dumping is performing id=[cdmp_20190208210924] Slave exiting with ORA-10562 exception Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr03_1068.trc: ORA-10562: Error occurred while applying redo to data block (file# 4, block# 1716972) ORA-10564: tablespace USERS ORA-01110: data file 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 204127 ORA-00600: internal error code, arguments: [ktbsdp1], [], [], [], [], [], [], [], [], [], [], [] Slave exiting with ORA-10562 exception Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr02_3780.trc: ORA-10562: Error occurred while applying redo to data block (file# 4, block# 1738552) ORA-10564: tablespace USERS ORA-01110: data file 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 211606 ORA-00600: internal error code, arguments: [6101], [17], [21], [0], [], [], [], [], [], [], [], [] Slave exiting with ORA-10562 exception Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr01_2040.trc: ORA-10562: Error occurred while applying redo to data block (file# 4, block# 1725898) ORA-10564: tablespace USERS ORA-01110: data file 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 73907 ORA-00600: internal error code, arguments: [kdourp_inorder2], [34], [0], [0], [44], [], [], [], [], [], [], [] Recovery Slave PR03 previously exited with exception 10562 Fri Feb 08 21:09:28 2019 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr04_2608.trc (incident=49395): ORA-00600: internal error code, arguments: [3020], [4], [1739291], [18516507], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 4, block# 1739291, file offset is 1363369984 bytes) ORA-10564: tablespace USERS ORA-01110: data file 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 211552 Incident details in: c:\oracle\diag\rdbms\orcl\orcl\incident\incdir_49395\orcl_pr04_2608_i49395.trc Slave exiting with ORA-600 exception Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr04_2608.trc: ORA-00600: internal error code, arguments: [3020], [4], [1739291], [18516507], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 4, block# 1739291, file offset is 1363369984 bytes) ORA-10564: tablespace USERS ORA-01110: data file 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 211552 Media Recovery failed with error 448 Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_1548.trc: ORA-00283: recovery session canceled due to errors ORA-00448: normal completion of background process Slave exiting with ORA-283 exception Errors in file c:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_1548.trc: ORA-00283: recovery session canceled due to errors ORA-00448: normal completion of background process ORA-10562 signalled during: ALTER DATABASE RECOVER database ...
出现上述问题主要是由于硬件突然故障,数据写丢失导致相关问题.
处理思路
RMAN> recover datafile 1; 启动 recover 于 09-2月 -19 使用通道 ORA_DISK_1 正在开始介质的恢复 介质恢复完成, 用时: 00:00:01 完成 recover 于 09-2月 -19 RMAN> recover datafile 2; 启动 recover 于 09-2月 -19 使用通道 ORA_DISK_1 正在开始介质的恢复 介质恢复完成, 用时: 00:00:01 完成 recover 于 09-2月 -19 RMAN> recover datafile 3; 启动 recover 于 09-2月 -19 使用通道 ORA_DISK_1 正在开始介质的恢复 介质恢复完成, 用时: 00:00:02 完成 recover 于 09-2月 -19 RMAN> recover datafile 4; 启动 recover 于 09-2月 -19 使用通道 ORA_DISK_1 正在开始介质的恢复 无法恢复介质 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: recover 命令 (在 02/09/2019 21:48:19 上) 失败 ORA-00283: recovery session canceled due to errors RMAN-11003: 在分析/执行 SQL 语句期间失败: alter database recover if needed datafile 4 ORA-00283: 恢复会话因错误而取消 ORA-10562: Error occurred while applying redo to data block (file# 4, block# 172 5913) ORA-10564: tablespace USERS ORA-01110: 数据文件 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 73907 ORA-00600: 内部错误代码, 参数: [kdourp_inorder2], [34], [43], [44], [44], [], [] , [], [], [], [], [] SQL> recover datafile 4; ORA-00283: 恢复会话因错误而取消 ORA-10562: Error occurred while applying redo to data block (file# 4, block# 1725913) ORA-10564: tablespace USERS ORA-01110: 数据文件 4: 'S:\ORADATA\ORCL\USERS01.DBF' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 73907 ORA-00600: 内部错误代码, 参数: [kdourp_inorder2], [34], [43], [44], [44], [], [], [], [], [], [], [] --通过bbed修改异常文件,屏蔽文件恢复,直接open库 SQL> alter database open; 数据库已更改。
数据库open之后,逻辑方式导出数据,重建新库,导入数据.
存储精简卷导致asm磁盘组异常
有朋友在一个存储空间给asm使用,发生空间不足,然后使用另外一个存储中的lun给asm的data磁盘组增加asm disk,运行了大概1天之后,asm磁盘组直接dismount,数据库crash.然后就无法正常mount.包括这个存储上的几个其他磁盘组也无法正常mount.
数据库异常日志
Sun Oct 23 08:43:59 2016 SUCCESS: diskgroup DATA was dismounted SUCCESS: diskgroup DATA was dismounted Sun Oct 23 08:44:00 2016 Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lmon_79128.trc: ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Sun Oct 23 08:44:00 2016 Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00345: redo log write error block 15924 count 2 ORA-00312: online log 2 thread 1: '+DATA/orcl/onlinelog/group_2.274.892363167' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_79174.trc: ORA-00204: error in reading (block 1, # blocks 1) of control file ORA-00202: control file: '+DATA/orcl/controlfile/current.278.892363163' ORA-15078: ASM diskgroup was forcibly dismounted Sun Oct 23 08:44:00 2016 LGWR (ospid: 79174): terminating the instance due to error 204 Sun Oct 23 08:44:00 2016 opiodr aborting process unknown ospid (79742) as a result of ORA-1092 Sun Oct 23 08:44:01 2016 ORA-1092 : opitsk aborting process Sun Oct 23 08:44:01 2016 ORA-1092 : opitsk aborting process System state dump requested by (instance=1, osid=79174 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_79118.trc Instance terminated by LGWR, pid = 79174
很明显,数据库异常是由于asm diskgroup dismount,因此分析asm 日志
asm 日志
Sun Oct 23 07:00:31 2016 Time drift detected. Please check VKTM trace file for more details. Sun Oct 23 08:43:55 2016 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 1048576 WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576 ERROR: failed to copy file +DATA.524, extent 15030 ERROR: ORA-15080 thrown in ARB0 for group number 1 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_8755.trc: ORA-15080: synchronous I/O operation to a disk failed Sun Oct 23 08:43:55 2016 NOTE: stopping process ARB0 NOTE: rebalance interrupted for group 1/0xec689cdd (DATA) NOTE: ASM did background COD recovery for group 1/0xec689cdd (DATA) NOTE: starting rebalance of group 1/0xec689cdd (DATA) at power 1 Starting background process ARB0 Sun Oct 23 08:43:56 2016 ARB0 started with pid=24, OS id=103554 NOTE: assigning ARB0 to group 1/0xec689cdd (DATA) with 1 parallel I/O Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 1048576 WARNING: Write Failed. group:1 disk:2 AU:1222738 offset:0 size:1048576 ERROR: failed to copy file +DATA.256, extent 6570 ERROR: ORA-15080 thrown in ARB0 for group number 1 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_103554.trc: ORA-15080: synchronous I/O operation to a disk failed NOTE: stopping process ARB0 Sun Oct 23 08:43:58 2016 Errors in file /oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_dbw0_8521.trc: ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 4096 WARNING: Write Failed. group:1 disk:3 AU:6789 offset:24576 size:4096 NOTE: cache initiating offline of disk 3 group DATA NOTE: process _dbw0_+asm1 (8521) initiating offline of disk 3.3915934787 (DATA_0003) with mask 0x7e in group 1 Sun Oct 23 08:43:58 2016 WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f is now being offlined WARNING: Disk 3 (DATA_0003) in group 1 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: initiating PST update: grp = 1, dsk = 3/0xe9686c43, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 14 for pid 14, osid 8521 ERROR: Disk 3 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Sun Oct 23 08:43:58 2016 NOTE: cache dismounting (not clean) group 1/0xEC689CDD (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 103577, image: oracle@node1 (B000) WARNING: Disk 3 (DATA_0003) in group 1 mode 0x7f offline is being aborted WARNING: Offline of disk 3 (DATA_0003) in group 1 and mode 0x7f failed on ASM inst 1 NOTE: halting all I/Os to diskgroup 1 (DATA) Sun Oct 23 08:43:59 2016 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=160.10145 last written ABA 160.10145
错误信息很明显,由于Write Failed导致asm diskgroup dismount.
系统日志
Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 6:0:12:1: [sdd] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdd, sector 12467058681 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681 Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] 33Add. Sense: Space allocation failed write protect Oct 23 08:43:55 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdb, sector 12467056640 Oct 23 08:43:55 node1 kernel: f9 00 00 04 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640 Oct 23 08:43:55 node1 kernel: 00 00 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467057657 Oct 23 08:43:55 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467057657 Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 8:0:6:1: [sdh] CDB: Write(16): 8a 00 00 00 00 02 e7 18 37 f9 00 00 00 07 00 00 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdh, sector 12467058681 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467058681 Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 8:0:12:1: [sdj] CDB: Write(16): 8a 00 00 00 00 02 e7 18 30 00 00 00 03 f9 00 00 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev sdj, sector 12467056640 Oct 23 08:43:57 node1 kernel: end_request: critical space allocation error, dev dm-3, sector 12467056640 Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] Add. Sense: Space allocation failed write protect Oct 23 08:43:57 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 02 e7 18 33 f9 00 00 04 00 00 00 Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Sense Key : Data Protect [current] Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] Add. Sense: Space allocation failed write protect Oct 23 08:43:58 node1 kernel: sd 6:0:4:1: [sdb] CDB: Write(16): 8a 00 00 00 00 03 3b 7e 78 30 00 00 00 08 00 00 Oct 23 10:50:59 node1 init: oracle-ohasd main process (6150) killed by TERM signal
错误信息为:critical space allocation error,严重空间分配错误.也就是linux在分配空间之时发生错误.在换而言之,由于分配空间错误导致asm 磁盘组dismount.
查看多路径信息
[root@node1 ~]# multipath -ll 36000d31003190c000000000000000003 dm-3 COMPELNT,Compellent Vol size=80T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:9:1 sdd 8:48 active ready running `- 8:0:9:1 sdi 8:128 active ready running delldisk2 (36000d310031908000000000000000003) dm-4 COMPELNT,Compellent Vol size=8.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:12:1 sde 8:64 active ready running |- 8:0:6:1 sdh 8:112 active ready running |- 6:0:4:1 sdb 8:16 active ready running `- 8:0:12:1 sdj 8:144 active ready running delldisk1 (36000d31003190a000000000000000007) dm-2 COMPELNT,Compellent Vol size=12T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:1:1 sda 8:0 active ready running |- 8:0:2:1 sdf 8:80 active ready running |- 6:0:7:1 sdc 8:32 active ready running `- 8:0:3:1 sdg 8:96 active ready running
很明显报错的都是同一个lun(delldisk2),也就是存储空间使用完的存储.也就是说,由于delldisk2存储的空间使用尽了导致系统出现分配空间错误,从而导致asm 写失败,进而导致数据库异常.这种问题的本质其实就是存储给系统分配了8T,但是实际存储可以使用的空间不足8T,而os按照8T来使用从而出现该问题.专业名字叫做”存储精简卷”.因此各位在存储配置之时需要注意该问题.因为这种情况的出现一般只是写io异常,读依旧正常,因此不会丢失数据.