标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,768)
- DB2 (22)
- MySQL (77)
- Oracle (1,609)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (166)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (29)
- Oracle备份恢复 (591)
- Oracle安装升级 (97)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (86)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (32)
- SQL Server恢复 (13)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (39)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (22)
-
最近发表
- ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)
- ORA-00756 ORA-10567故障数据0丢失恢复
- 数据库文件变成32k故障恢复
- tcp连接过多导致监听TNS-12532 TNS-12560 TNS-00502错误
- 文件系统格式化MySQL数据库恢复
- .sstop勒索加密数据库恢复
- 解决一次硬件恢复之后数据文件0kb的故障恢复case
- Error in invoking target ‘libasmclntsh19.ohso libasmperl19.ohso client_sharedlib’问题处理
- ORA-01171: datafile N going offline due to error advancing checkpoint
- linux环境oracle数据库被文件系统勒索加密为.babyk扩展名溯源
- ORA-600 ksvworkmsgalloc: bad reaper
- ORA-600 krccfl_chunk故障处理
- Oracle Recovery Tools恢复案例总结—202505
- ORA-600 kddummy_blkchk 数据库循环重启
- 记录一次asm disk加入到vg通过恢复直接open库的案例
- CHECKDB 发现了 N 个分配错误和 M 个一致性错误
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
分类目录归档:Oracle
ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)
故障总结:客户正常关闭数据库,然后启动报ORA-600 kokiasg1错误,通过对启动分析确认是由于IDGEN1$序列丢失导致,修复该故障之后,数据库启动成功,但是后台大量报ORA-600 12803,ORA-600 15264等错误,业务用户无法登录.经过深入分析,发现数据库字典obj$中所有核心字典的序列全部被删除,但是在seq$中这些对象的obj#记录还存在.初步怀疑是有人恶意删除了obj$中字典核心序列对象导致.
数据库启动报ORA-600 kokiasg1错误
SQL> startup ; ORACLE 例程已经启动。 Total System Global Area 1.4531E+10 bytes Fixed Size 2295256 bytes Variable Size 2181040680 bytes Database Buffers 1.2314E+10 bytes Redo Buffers 33193984 bytes 数据库装载完毕。 ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] 进程 ID: 5628 会话 ID: 122 序列号: 3
对应的alert日志信息
Thu Jul 03 16:35:25 2025 Shutting down instance (immediate) Stopping background process SMCO Shutting down instance: further logons disabled Thu Jul 03 16:35:26 2025 Stopping background process CJQ0 Stopping background process QMNC Stopping background process MMNL Stopping background process MMON License high water mark = 272 All dispatchers and shared servers shutdown Thu Jul 03 16:35:54 2025 alter database close normal Thu Jul 03 16:35:54 2025 SMON: disabling tx recovery SMON: disabling cache recovery Thu Jul 03 16:35:54 2025 Shutting down archive processes Archiving is disabled Archive process shutdown avoided: 0 active Thread 1 closed at log sequence 296590 Successful close of redo thread 1 Completed: alter database close normal alter database dismount Shutting down archive processes Archiving is disabled Completed: alter database dismount ARCH: Archival disabled due to shutdown: 1089 Shutting down archive processes Archiving is disabled ARCH: Archival disabled due to shutdown: 1089 Shutting down archive processes Archiving is disabled Thu Jul 03 16:36:02 2025 Stopping background process VKTM Thu Jul 03 16:36:07 2025 Instance shutdown complete Thu Jul 03 16:36:19 2025 Adjusting the default value of parameter parallel_max_servers from 640 to 270 due to the value of parameter processes (300) Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 16 Number of processor cores in the system is 8 Number of processor sockets in the system is 1 Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST Autotune of undo retention is turned on. IMODE=BR ILAT =52 LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options. Windows NT Version V6.2 CPU : 16 - type 8664, 8 Physical Cores Process Affinity : 0x0x0000000000000000 Memory (Avail/Total): Ph:24712M/32767M, Ph+PgF:14089M/39123M System parameters with non-default values: processes = 300 sessions = 480 nls_language = "SIMPLIFIED CHINESE" nls_territory = "CHINA" sga_target = 13920M control_files = "D:\APP\ADMINISTRATOR\ORADATA\orcl\CONTROL01.CTL" control_files = "D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\orcl\CONTROL02.CTL" db_block_size = 8192 compatible = "11.2.0.4.0" db_recovery_file_dest = "D:\app\Administrator\fast_recovery_area" db_recovery_file_dest_size= 10G undo_tablespace = "UNDOTBS1" remote_login_passwordfile= "EXCLUSIVE" db_domain = "" dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)" job_queue_processes = 10 audit_file_dest = "D:\APP\ADMINISTRATOR\ADMIN\orcl\ADUMP" audit_trail = "DB" db_name = "orcl" open_cursors = 300 pga_aggregate_target = 4639M diagnostic_dest = "D:\APP\ADMINISTRATOR" Thu Jul 03 16:36:20 2025 PMON started with pid=2, OS id=13088 Thu Jul 03 16:36:20 2025 PSP0 started with pid=3, OS id=16168 Thu Jul 03 16:36:21 2025 VKTM started with pid=4, OS id=7948 at elevated priority VKTM running at (10)millisec precision with DBRM quantum (100)ms Thu Jul 03 16:36:21 2025 GEN0 started with pid=5, OS id=4192 Thu Jul 03 16:36:21 2025 DIAG started with pid=6, OS id=8232 Thu Jul 03 16:36:21 2025 DBRM started with pid=7, OS id=16436 Thu Jul 03 16:36:21 2025 DIA0 started with pid=8, OS id=11400 Thu Jul 03 16:36:21 2025 MMAN started with pid=9, OS id=11108 Thu Jul 03 16:36:21 2025 DBW0 started with pid=10, OS id=12232 Thu Jul 03 16:36:21 2025 DBW1 started with pid=11, OS id=7368 Thu Jul 03 16:36:21 2025 LGWR started with pid=12, OS id=13520 Thu Jul 03 16:36:21 2025 CKPT started with pid=13, OS id=11952 Thu Jul 03 16:36:21 2025 SMON started with pid=14, OS id=9304 Thu Jul 03 16:36:21 2025 RECO started with pid=15, OS id=17136 Thu Jul 03 16:36:21 2025 MMON started with pid=16, OS id=1984 Thu Jul 03 16:36:21 2025 MMNL started with pid=17, OS id=2568 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))' starting up 1 shared server(s) ... ORACLE_BASE from environment = D:\app\Administrator Thu Jul 03 16:36:22 2025 alter database mount exclusive Successful mount of redo thread 1, with mount id 1287723014 Database mounted in Exclusive Mode Lost write protection disabled Completed: alter database mount exclusive alter database open Thread 1 opened at log sequence 296590 Current log# 1 seq# 296590 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\orcl\REDO01.LOG Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery [15144] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:3680275922 end:3680276032 diff:110 (1 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc (incident=7579): ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_7579\orcl_ora_15144_i7579.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc: ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc: ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 15144): terminating the instance due to error 600 Instance terminated by USER, pid = 15144 ORA-1092 signalled during: alter database open...
对数据库启动过程进行跟踪确认报错可能和IDGEN1$对象有关系
PARSING IN CURSOR #615624160 len=30 dep=1 uid=0 oct=3 lid=0 tim=752975051401 hv=3013659460 ad='7ffbd8f025d0' sqlid='6d8vr86tu1ku4' select TOTAL from SYS.ID_GENS$ END OF STMT PARSE #615624160:c=15625,e=2775,p=2,cr=14,cu=0,mis=1,r=0,dep=1,og=4,plh=1676180847,tim=752975051401 EXEC #615624160:c=0,e=6,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1676180847,tim=752975051452 WAIT #615624160: nam='db file sequential read' ela= 126 file#=1 block#=3440 blocks=1 obj#=514 tim=752975051594 WAIT #615624160: nam='db file sequential read' ela= 48 file#=1 block#=3441 blocks=1 obj#=514 tim=752975051671 FETCH #615624160:c=0,e=224,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1676180847,tim=752975051687 STAT #615624160 id=1 cnt=1 pid=0 pos=1 obj=514 op='TABLE ACCESS FULL ID_GENS$ (cr=3 pr=2 pw=0 time=223 us)' CLOSE #615624160:c=0,e=15,dep=1,type=0,tim=752975051716 BINDS #12720440: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=80 off=0 kxsbbbfp=24b1b128 bln=22 avl=01 flg=05 value=0 Bind#1 oacdty=01 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=10 fl2=0001 frm=01 csi=852 siz=0 off=24 kxsbbbfp=24b1b140 bln=32 avl=07 flg=01 value="IDGEN1$" Bind#2 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=56 kxsbbbfp=24b1b160 bln=22 avl=02 flg=01 value=1 EXEC #12720440:c=0,e=107,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051842 FETCH #12720440:c=0,e=5,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051856 CLOSE #12720440:c=0,e=0,dep=1,type=3,tim=752975051870 Incident 161 created, dump file: C:\APP\XFF\diag\rdbms\orcl\orcl\incident\incdir_161\orcl_ora_1880_i161.trc ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
从mos中确认当数据库缺少IDGEN1$序列的时候,启动会报ORA-600 kokiasg1错误.
使用工具恢复obj$表到新库中
E:\dump>imp test/oracle file=SYS_OBJ$.dmp full=y Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 09:34:42 2025 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 经由常规路径由 EXPORT:V08.01.07 创建的导出文件 警告: 这些对象由 SYS 导出, 而不是当前用户 已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入 导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换) . 正在将 SYS 的对象导入到 TEST . 正在将 SYS 的对象导入到 TEST . . 正在导入表 "OBJ$"导入了 103764 行 成功终止导入, 没有出现警告。
查询test.obj$表确认没有IDGEN1$对象名称记录
SQL> select * from test.obj$ where name='IDGEN1$'; 未选定行 SQL>
查询正常obj$字典中关于IDGEN1$对象信息
SQL> select owner#, obj#,type# from obj$ where name='IDGEN1$'; OWNER# OBJ# TYPE# ---------- ---------- ---------- 0 1229 6
在故障库恢复出来的test.obj$中查询obj#为1229附近对象
SQL> select owner#, obj#,type#,name from test.obj$ where obj# in(1228,1229,1230); OWNER# OBJ# TYPE# NAME ---------- ---------- ---------- ------------------------------ 0 1228 2 DST$TRIGGER_TABLE 0 1230 13 BFILE SQL> select owner#, obj#,type#,name from obj$ where obj# in(1228,1229,1230); OWNER# OBJ# TYPE# NAME ---------- ---------- ---------- ------------------------------ 0 1228 2 DST$TRIGGER_TABLE 0 1229 6 IDGEN1$ 0 1230 13 BFILE
目前看初步判断故障库确实由于IDGEN1$序列丢失导致无法启动,处理过程相对比较简单,在数据库open的过程中,打开新会话创建IDGEN1$序列序列
然后重启数据库,即可正常启动成功,让看尝试登录数据库报ora-600 12803错误

再次检查alert日志大量ORA-600错误
Fri Jul 04 15:57:13 2025 Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_ora_27788.trc (incident=12239): ORA-00600: 内部错误代码, 参数: [12803], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Fri Jul 04 15:58:04 2025 Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_mmon_1976.trc (incident=12184): ORA-00600: 内部错误代码, 参数: [15264], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details.
基于这样ORA-600错误,初步怀疑字典层面还有问题,因为最初的错误是序列异常,所以这次我重点对系统队列进行分析,通过dul把seq$表恢复到test用户中
E:\dump>imp test/oracle file=SYS_seq$.dmp full=y Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 10:10:17 2025 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 经由常规路径由 EXPORT:V08.01.07 创建的导出文件 警告: 这些对象由 SYS 导出, 而不是当前用户 已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入 导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换) . 正在将 SYS 的对象导入到 TEST . 正在将 SYS 的对象导入到 TEST . . 正在导入表 "SEQ$"导入了 359 行 成功终止导入, 没有出现警告。
查询发现之前的序列(obj=1229)的竟然还在seq$中(obj$中没有了记录)
SQL> select * from test.seq$ where obj#=1229; OBJ# INCREMENT$ MINVALUE MAXVALUE CYCLE# ORDER$ CACHE ---------- ---------- ---------- ---------- ---------- ---------- ---------- HIGHWATER AUDIT$ FLAGS ---------- -------------------------------------- ---------- 1229 50 1 1.0000E+28 0 0 1000 60267151 -------------------------------- 0
这种现象证明seq 不是通过drop sequence命令删除,而可能直接delete obj$表进行删除,通过试验重现正常删除seq之后,obj$和seq$都会同步被删除
SQL> create sequence xxxx; 序列已创建。 SQL> select obj#,type# from obj$ where name='XXXX'; OBJ# TYPE# ---------- ---------- 87383 6 SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383; OBJ# INCREMENT$ MINVALUE MAXVALUE CYCLE# ORDER$ CACHE ---------- ---------- ---------- ---------- ---------- ---------- ---------- HIGHWATER AUDIT$ FLAGS ---------- -------------------------------------- ---------- 87383 1 1 1.0000E+28 0 0 20 1 -------------------------------- 0 SQL> DROP SEQUENCE XXXX; 序列已删除。 SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383; 未选定行 SQL> select obj#,type# from obj$ where name='XXXX'; 未选定行
想到这里,那进一步分析,是否还有其他的系统序列被删除,分析思路是:在一个正常的库里面找出来SYS的seq的obj#,然后和test用户里面的obj$,seq$表里面对比
找出来test.obj$中sys用户的seq对象名字
SQL> select name,obj#,type# from test.obj$ where obj# in( 2 select obj# from sys.obj$ where owner#=0 and type#=6) 3 and type#=6; 未选定行
通过查询确认故障库中sys下面系统自带的核心seq的对象名称全部被删除(obj$中明确被删除),分析seq$中情况确认
SQL> select name,ctime from test.obj$ where type#=6 and owner#=0; 未选定行
通过上述相关核实,故障库中的obj$中系统字典seq基本上被删除(正常情况应该有130多个).对于这种情况,后续的类此比较简单,通过seq$表内容,构造出来系统 seq的创建语句,对其进行创建,然后数据库恢复正常,完成本次恢复工作.
ORA-00756 ORA-10567故障数据0丢失恢复
客户虚拟化故障修复之后,数据库启动报ORA-600 kcratr_scan_lastbwr错误
这个是一个比较常见的错误,一般recover 下就ok了,但是有些时候会出现ORA-600 3020或者类似ORA-00756 ORA-10567的错误,比如这次不幸就遇到了该错误
SQL> recover database; ORA-00283: recovery session canceled due to errors ORA-00756: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file offset is 2750414848 bytes) ORA-10564: tablespace PACS55 ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649
然后尝试单个文件recover恢复
SQL> recover datafile 10; ORA-00283: recovery session canceled due to errors ORA-00756: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file offset is 2750414848 bytes) ORA-10564: tablespace PACS55 ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649 SQL> recover datafile 9; ORA-00283: recovery session canceled due to errors ORA-00756: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 9, block# 4158754, file offset is 4003741696 bytes) ORA-10564: tablespace PACS55 ORA-01110: data file 9: '/u02/oradata/pacsdb/pacs55.3.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76660
通过dbv检查这两个异常文件
[oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.3.dbf DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:02:15 2025 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.3.dbf DBVERIFY - Verification complete Total Pages Examined : 4194302 Total Pages Processed (Data) : 2482487 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 1655515 Total Pages Failing (Index): 0 Total Pages Processed (Lob) : 25017 Total Pages Failing (Lob) : 0 Total Pages Processed (Other): 15919 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 15364 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 311133131196 (72.1895485884) [oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.4.dbf DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:04:59 2025 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.4.dbf DBVERIFY - Verification complete Total Pages Examined : 4194302 Total Pages Processed (Data) : 2466409 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 1683244 Total Pages Failing (Index): 0 Total Pages Processed (Lob) : 16977 Total Pages Failing (Lob) : 0 Total Pages Processed (Other): 15909 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 11763 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 311133133727 (72.1895488415)
确定数据文件本身没有坏块,只是redo写丢失或者某种bug导致少量block应用redo的时候异常,而且报错是index,直接通过底层处理报错的block,让其这些报错的block直接不应用日志,然后完成recover操作,其他数据块数据不会丢失(最大限度减少损失,而不是直接修改文件头scn,或者强制拉库的方式来处理)
SQL> select file#,fuzzy from v$datafile_header; FILE# FUZ ---------- --- 1 NO 2 NO 3 NO 4 NO 5 NO 7 NO 8 NO 9 YES 10 YES 11 NO 12 NO FILE# FUZ ---------- --- 13 NO 14 NO 15 NO 16 NO 17 NO 18 NO 19 NO 18 rows selected. SQL> recover datafile 9 ; Media recovery complete. SQL> recover datafile 10 ; ORA-00283: recovery session canceled due to errors ORA-00756: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 10, block# 3822912, file offset is 1252524032 bytes) ORA-10564: tablespace PACS55 ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649 SQL> recover datafile 10; Media recovery complete.
正常open数据库成功,并rebuild 异常的对象
SQL> alter database open; Database altered. SQL> select owner,object_name,object_type from dba_objects where data_object_id in(76649,76660); OWNER -------------------------------------------------------------------------------- OBJECT_NAME -------------------------------------------------------------------------------- OBJECT_TYPE ----------------------- PACS55 STUDYINFO_DIAGRPTID INDEX PACS55 PACS_STUDYINFO_PK INDEX OWNER -------------------------------------------------------------------------------- OBJECT_NAME -------------------------------------------------------------------------------- OBJECT_TYPE ----------------------- SQL> alter index PACS55.STUDYINFO_DIAGRPTID rebuild online parallel 4; Index altered. SQL> alter index PACS55.PACS_STUDYINFO_PK rebuild online parallel 4; Index altered. SQL> SQL> SQL> SQL> alter index PACS55.STUDYINFO_DIAGRPTID noparallel; alter index PACS55.PACS_STUDYINFO_PK noparallel; Index altered. SQL> Index altered.
至此该库完美恢复业务可以直接使用,业务数据0丢失.这次运气比较好,如果是表数据异常,可能会麻烦一点,但是也可以最大限度恢复(肯定比强制拉库,或者修改文件头的方式效果好)
数据库文件变成32k故障恢复
最近一个客户数据库重启系统之后,数据文件大小变为了32kb,我接手的不是第一现场(客户那边尝试了rman还原操作),查看alert日志,数据库最初报错
Wed Jun 18 13:09:23 2025 alter database open Block change tracking file is current. Read of datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF' (fno 14) header failed with ORA-01210 Hex dump of (file 14, block 1) in trace file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc Corrupt block relative dba: 0x03800001 (file 14, block 1) Completely zero block found during datafile header read Rereading datafile 14 header failed with ORA-01210 Hex dump of (file 14, block 1) in trace file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc Corrupt block relative dba: 0x03800001 (file 14, block 1) Completely zero block found during datafile header read Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc: ORA-01122: 数据库文件 14 验证失败 ORA-01110: 数据文件 14: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF' ORA-01210: 数据文件标头发生介质损坏 ORA-1122 signalled during: alter database open... Wed Jun 18 13:09:23 2025 Checker run found 1 new persistent data failures
客户那边不知道做了什么操作之后报错(初步估计是把14号文件重命名了)
Thu Jun 19 16:04:19 2025 alter database open Thu Jun 19 16:04:21 2025 Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_dbw0_13000.trc: ORA-01157: ????/?????? 14 - ??? DBWR ???? ORA-01110: ???? 14: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF' ORA-27041: ?????? OSD-04002: 无法打开文件 O/S-Error: (OS 2) 系统找不到指定的文件。 Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_12328.trc: ORA-03113: 通信通道的文件结尾 ORA-3113 signalled during: alter database open...
根据客户反馈14号文件变成了32kb,就是被重命名的.bak文件
这其中有一个bak0618是通过rman还原出来的(备份中无有效的14号文件备份,还原出来的为该文件初始化创建大小)
Thu Jul 07 16:57:05 2022 alter tablespace wasion add datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.dbf' size 10g autoextend on Completed: alter tablespace wasion add datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.dbf' size 10g autoextend on
基于当前情况,可以确认该文件异常,而且没有有效的rman备份.通过分析备份脚本,发现每个备份集1个数据文件,而且没有压缩,并按照10g进行分割为多个文件

这些本身没有问题,脚本的后面有直接通过系统级别命令删除两天之前的备份文件

这里有一个问题,由于磁盘空间不足,导致部分备份不成功,但是系统级别删除操作依旧正常进行,导致以前有效的备份被删除,后面的备份又没有成功(这个是本次该文件无法还原的主要原因),慎重提醒,rman备份尽量使用rman本身的策略来管理不要使用系统命令来维护备份策略,基于这样的情况,可以使用反删除命令找出来了一些该文件的备份集,并注册到控制文件中
RMAN> list backup of datafile 14; 备份集列表 =================== BS 关键字 类型 LV 大小 设备类型 经过时间 完成时间 ------- ---- -- ---------- ----------- ------------ ---------- 35251 Incr 0 10.89G DISK 00:01:20 15-6月 -25 备份集 35251 中的数据文件列表 文件 LV 类型 Ckp SCN Ckp 时间 名称 ---- -- ---- ---------- ---------- ---- 14 0 Incr 758850903 15-6月 -25 D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF 备份集 副本号 2 属于备份集 35251 设备类型 经过时间 完成时间 压缩标记 ----------- ------------ ---------- ---------- --- DISK 00:01:20 26-6月 -25 NO TAG20250615T220003 备份集 35251 副本号 2的备份片段列表 BP 关键字 Pc# 状态 段名称 ------- --- ----------- ---------- 78307 1 AVAILABLE H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1 78308 2 AVAILABLE H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_2_1 BS 关键字 类型 LV 大小 设备类型 经过时间 完成时间 ------- ---- -- ---------- ----------- ------------ ---------- 35266 Incr 0 1.81G DISK 00:00:00 17-6月 -25 备份集 35266 中的数据文件列表 文件 LV 类型 Ckp SCN Ckp 时间 名称 ---- -- ---- ---------- ---------- ---- 14 Full 759283192 17-6月 -25 D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF 备份集 副本号 1 属于备份集 35266 设备类型 经过时间 完成时间 压缩标记 ----------- ------------ ---------- ---------- --- DISK 00:00:00 26-6月 -25 NO TAG20250617T220049 备份集 35266 副本号 1的备份片段列表 BP 关键字 Pc# 状态 段名称 ------- --- ----------- ---------- 1 DELETED <---缺少一个备份集文件 78309 2 AVAILABLE H:\BAIDUNETDISK\202506191452\L0_ORCL_20250617_79022_5E3S94MC_2_1
尝试rman还原这些备份文件
RMAN> run 2> { 3> SET NEWNAME FOR DATAFILE 14 to 'H:\BaiduNetdisk\202506191452\14.dbf'; 4> restore datafile 14; 5> } 正在执行命令: SET NEWNAME 启动 restore 于 26-6月 -25 使用通道 ORA_DISK_1 通道 ORA_DISK_1: 正在开始还原数据文件备份集 通道 ORA_DISK_1: 正在指定从备份集还原的数据文件 通道 ORA_DISK_1: 将数据文件 00014 还原到 H:\BAIDUNETDISK\202506191452\14.DBF 通道 ORA_DISK_1: 正在还原段 1 (属于 2) 通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: restore 命令 (在 06/26/2025 08:35:53 上) 失败 ORA-19870: 还原备份片段 H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1 时出错 ORA-00600: 内部错误代码, 参数: [krbvalmrange_badfno], [1], [14], [], [], [], [], [], [], [], [], []
alert日志报错
Thu Jun 26 08:25:26 2025 Checker run found 39 new persistent data failures Thu Jun 26 08:35:51 2025 Datafile rdba reconstruction error, expected block greater than 804966, got 322047 for datafile 14 Corrupt block 804352 found during reading backup piece, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1, corr_type=4 Reread of blocknum=804352, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1. found valid data Datafile rdba reconstruction error, expected block greater than 324095, got 55516 for datafile 14 Corrupt block 806400 found during reading backup piece, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1, corr_type=4 Reread of blocknum=806400, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1. found valid data Errors in file C:\APP\XFF\diag\rdbms\ORCL\orcl\trace\orcl_ora_19208.trc (incident=177): ORA-00600: 内部错误代码, 参数: [krbvalmrange_badfno], [1], [14], [], [], [], [], [], [], [], [], [] Incident details in: C:\APP\XFF\diag\rdbms\ORCL\orcl\incident\incdir_177\orcl_ora_19208_i177.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Thu Jun 26 08:35:52 2025
后面通过工具分析以及ORA-600 krbvalmrange_badfno的错误,基本上可以确认在反删除恢复的备份集文件中部分rman的block是其他数据文件的,从而导致无法正常还原.基于这种情况,通过工具进行强制还原出来部分14号数据文件的block
然后再通过磁盘级别碎片,找到部分没有覆盖的block

把rman备份中强制抽取的部分block和底层碎片恢复的没有覆盖的block组合到一起,通过检测确认恢复了大概2/3的数据

基于恢复的该文件和这个表空间的其他文件一起,使用dul工具把数据恢复到新库中,最大限度完成本次数据的抢救工作.
本次故障本不该发生,或者说发生不该如此严重:
1. rman备份采用系统级别维护策略,在备份没有成功的情况下依旧通过系统层面删除文件,导致故障文件无一份有效备份
2. 发生故障之后,没有保护现场的意识:对于32kb的数据文件所在磁盘进行了大量的写入操作(近1T的数据文件直接在本盘做了一次拷贝,还有rman默认写入到了以前文件所在位置)