标签归档:OSD-04016

O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理

有客户由于磁盘坏道导致数据文件访问报ORA-27070 OSD-04016 O/S-Error等相关错误
OSD-04016


rman 尝试读取88号文件

RMAN> backup datafile 88 format 'e:\rman\%d_%T_%I.%s%p';

启动 backup 于 05-6月 -24
使用目标数据库控制文件替代恢复目录
分配的通道: ORA_DISK_1
通道 ORA_DISK_1: SID=2246 设备类型=DISK
通道 ORA_DISK_1: 正在启动全部数据文件备份集
通道 ORA_DISK_1: 正在指定备份集内的数据文件
输入数据文件: 文件号=00088 名称=E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA
通道 ORA_DISK_1: 正在启动段 1 于 05-6月 -24
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: backup 命令 (ORA_DISK_1 通道上, 在 06/05/2024 18:33:43 上) 失败
ORA-19501: 文件 "E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA", 块编号 322944 (块大小=8192) 上出现读取错误
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。

检查系统日志,发现有报:设备 \Device\Harddisk0\DR0 有一个不正确的区块
disk-error


到目前为止基本上可以判断是文件系统或者磁盘层面出现坏道导致该问题(磁盘坏道概率更大),使用工具对损坏数据文件进行强制拷贝,提示少量扇区数据无法拷贝
force-copy

通过dbv检查恢复出来文件效果

E:\check_db>DBV FILE=D:/XIFENFEI.ORA

DBVERIFY: Release 11.2.0.3.0 - Production on 星期日 6月 9 22:17:28 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\XIFENFEI.ORA
页 323055 标记为损坏
Corrupt block relative dba: 0x1604edef (file 88, block 323055)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0x4c390601
 check value in block header: 0xe5e5
 computed block checksum: 0x5003



DBVERIFY - 验证完成

检查的页总数: 524288
处理的页总数 (数据): 204510
失败的页总数 (数据): 0
处理的页总数 (索引): 127485
失败的页总数 (索引): 0
处理的页总数 (其他): 3030
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 189262
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 184063522 (3470.184063522)

运气不错,就一个数据库block异常,通过dba_extents查询坏块所属对象,运气不太好是一个表数据

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  88
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 88
输入 block_id 的值:  323055
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 323055 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
XFFUSER
XFFTABLE
TABLE              XFFTBS_TAB2

设置跳过坏块导出数据,然后重命名原表导入数据,完成本次恢复,以前有过类似恢复:

一次侥幸的OSD-04016 O/S-Error异常恢复
O/S-Error: (OS 23) 数据错误(循环冗余检查) 数据库恢复

发表在 Oracle备份恢复 | 标签为 , , | 留下评论

O/S-Error: (OS 23) 数据错误(循环冗余检查) 数据库恢复

有客户数据库运行过程中突然crash,检测发现ORA-27070 OSD-04016 O/S-Error: (OS 23) 等报错

Thu May 12 11:25:53 2022
KCF: write/open error block=0x19e95f online=1
     file=57 H:\ORADATA\xifenfei\XFF51.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。'
Thu May 12 11:25:53 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw0_3532.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式
ORA-01114: 将块写入文件 57 时出现 IO 错误 (块 # 1698143)
ORA-01110: 数据文件 57: 'H:\ORADATA\xifenfei\XFF51.DBF'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。

DBW0: terminating instance due to error 1242
Thu May 12 11:25:54 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_mman_3528.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:25:54 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_lgwr_3544.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:25:55 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw1_3536.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:25:55 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_psp0_3524.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:25:55 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_ckpt_3548.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:25:55 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_3520.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:26:06 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_q002_37468.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:26:08 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_reco_3556.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:26:08 2022
Errors in file e:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_3552.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式

Thu May 12 11:26:10 2022
Instance terminated by DBW0, pid = 3532

再次重启数据库报错 ORA-27070: 异步读取/写入失败 OSD-04016: 异步 I/O 请求排队时出错。类似错误
osd-04006


dbv检查数据文件报异常
dbv-io-error

通过以上信息基本上可以确认是由于底层故障(文件系统或者硬件故障),导致数据库文件访问异常,检查系统日志发现异常
20220518142942

通过专业恢复软件对异常文件进行恢复,实现数据库正常open(跳过坏块)
20220518143342

发表在 Oracle, Oracle备份恢复 | 标签为 , , | 评论关闭

硬件故障导致ORA-600 2662错误处理

前几天恢复了一个40多T的CASE:ORA-00600: internal error code, arguments: [16513], [1403] 恢复,又一个近30T的库由于硬件故障,通过其他人一系列恢复之后,无法正常open,让我们提供技术支持:
故障最初原因是由于存储异常

Fri Feb 19 09:03:49 2021
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_3460.trc:
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。

通过其他人一系列处理后,数据库报ORA-600 2662错误

Sat Feb 20 08:19:35 2021
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Feb 20 08:19:35 2021
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc(incident=1960181):
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 5304): terminating the instance due to error 600
Instance terminated by USER, pid = 5304
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (5304) as a result of ORA-1092
Sat Feb 20 08:19:42 2021
ORA-1092 : opitsk aborting process

通过对scn处理,数据库顺利绕过该错误,然后报ORA-600 4194错误

Doing block recovery for file 213 block 4688
No block recovery was needed
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc(incident=1984136):
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
Sat Feb 20 10:50:45 2021
Doing block recovery for file 213 block 4688
No block recovery was needed
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc:
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
SMON (ospid: 7048): terminating the instance due to error 474
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_6652.trc(incident=1984185):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Sat Feb 20 10:50:52 2021
Instance terminated by SMON, pid = 7048

通过对异常事务进行处理,屏蔽smon进程进行回滚,数据库open成功,但是报ORA-600 4137错误

Sat Feb 20 10:53:46 2021
Sweep [inc][1992133]: completed
Stopping background process MMNL
Sat Feb 20 10:53:47 2021
Trace dumping is performing id=[cdmp_20210220105347]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992134):
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (23, 13).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:47 2021
Sweep [inc2][1992133]: completed
Sat Feb 20 10:53:47 2021
Sweep [inc][1992134]: completed
Stopping background process MMON
Trace dumping is performing id=[cdmp_20210220105348]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992135):
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Starting background process MMON
Starting background process MMNL
Sat Feb 20 10:53:48 2021
MMON started with pid=16, OS id=6448 
ALTER SYSTEM enable restricted session;
Sat Feb 20 10:53:48 2021
MMNL started with pid=36, OS id=6840 
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (38, 4).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:49 2021
Sweep [inc][1992135]: completed
Trace dumping is performing id=[cdmp_20210220105349]
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open

对异常回滚段进行处理,数据库后端启动正常,不再报明显ORA-错误.通过hcheck.sql检查字典正常

HCheck Version 07MAY18 on 20-FEB-2021 11:35:11
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: JYJG

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 02/20 11:35:13 PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 02/20 11:35:15 PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 02/20 11:35:18 PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 02/20 11:35:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
---------------------------------------
20-FEB-2021 11:35:19  Elapsed: 8 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

虽然字典正常,但是由于数据库屏蔽了一致性,建议客户在条件允许的情况下,进行逻辑迁移,排除风险隐患.

发表在 非常规恢复 | 标签为 , , , , , , | 评论关闭