ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR]

数据库在运行过程中报O/S-Error: (OS 23) 数据错误(循环冗余检查)错误

Thu Jan 30 22:00:02 2025
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Thu Jan 30 22:00:04 2025
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_12576.trc:
ORA-12012: error on auto execute of job 155962
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-06512: at "SYS.DBMS_STATS", line 25836
ORA-06512: at "SYS.DBMS_STATS", line 26171
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Fri Jan 31 02:00:00 2025
Clearing Resource Manager plan via parameter
Fri Jan 31 08:15:46 2025
Thread 1 advanced to log sequence 4420 (LGWR switch)
  Current log# 1 seq# 4420 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Fri Jan 31 10:53:57 2025
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_cjq0_1140.trc:
Fri Jan 31 10:53:57 2025
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_7916.trc:
ORA-27102: out of memory
OSD-00043: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_cjq0_1140.trc:
Fri Jan 31 10:54:03 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x18] [PC:0xB778B02, clsdcxini()+90]
ERROR: Unable to normalize symbol name for the following short stack (at offset 199):
dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726<-dbkePostKGE_kgsf()+75
<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869<-sss_xcpt_EvalFilter()+174
<-.1.4_5+59<-00007FFD0245F306<-00007FFD024735AF<-00007FFD023D4AAF<-00007FFD0247231E<-clsdcxini()+90
<-clsdinit()+124<-ksdnfy()+225<-kscnfy()+778<-opirip()+86<-opidrv()+909<-sou2o()+98<-opimai_real()+299
<-opimai()+191<-BackgroundThreadStart()+693<-00007FFD020E7E94<-00007FFD02437AD1
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_9920.trc  (incident=39621):
ORA-07445: exception encountered: core dump [clsdcxini()+90][ACCESS_VIOLATION][ADDR:0x18][PC:0xB778B02][UNABLE_TO_READ]
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39621\orcl_j000_9920_i39621.trc

然后无法正常启动,报Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]错误

Wed Feb 05 09:43:51 2025
Sweep [inc][39621]: completed
Successful mount of redo thread 1, with mount id 1720066005
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 140 KB redo, 62 data blocks need recovery
Started redo application at
 Thread 1: logseq 4420, block 42375
Wed Feb 05 09:44:00 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4420 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Completed redo application of 0.09MB
Completed crash recovery at
 Thread 1: logseq 4420, block 42656, scn 94456019
 62 data blocks read, 62 data blocks written, 140 redo k-bytes read
Wed Feb 05 09:44:01 2025
Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]
ERROR: Unable to normalize symbol name for the following short stack (at offset 199):
dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726
<-dbkePostKGE_kgsf()+75<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869
<-sss_xcpt_EvalFilter()+174<-.1.4_5+59<-00007FFD0245F306<-00007FFD024735AF<-00007FFD023D4AAF
<-00007FFD0247231E<-expgod()+43<-xtyopncb()+241<-qctcopn()+613<-qctcopn()+392<-qctcpqb()+290
<-qctcpqbl()+52<-xtydrv()+148<-opitca()+1091<-kksLoadChild()+9008<-kxsGetRuntimeLock()+2320
<-kksfbc()+15225<-kkspsc0()+2117<-kksParseCursor()+181<-opiosq0()+2538<-opiosq()+23<-opiodr()+1662
<-rpidrus()+862<-rpidru()+154<-rpiswu2()+2757<-rpidrv()+6105<-rpisplu()+1607<-kqldFixedTableLoadCols()+345
<-kqldcor()+2534<-kglslod()+352<-kqlslod()+52<-PGOSF455_kqlsublod()+125<-kqllod()+7284<-kglobld()+1354
<-kglobpn()+1900<-kglpim()+336<-qcdlgtd()+260<-qcsfplob()+166<-qcsprfro()+903<-qcsprfro_tree()
+292<-qcsprfro_tree()+373<-qcspafq()+96
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_mmon_15468.trc  (incident=39749):
ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR] [] [PC:0x2C9C015] [] []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39749\orcl_mmon_15468_i39749.trc
Wed Feb 05 09:44:02 2025
Thread 1 advanced to log sequence 4421 (thread open)
Thread 1 opened at log sequence 4421
  Current log# 2 seq# 4421 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Feb 05 09:44:02 2025
SMON: enabling cache recovery
Wed Feb 05 09:44:11 2025
Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_9308.trc  (incident=39781):
ORA-07445: ??????: ???? [expgod()+43] [IN_PAGE_ERROR] [] [PC:0x2C9C015] [] []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39781\orcl_ora_9308_i39781.trc
Wed Feb 05 09:44:19 2025
PMON (ospid: 12376): terminating the instance due to error 397
Instance terminated by PMON, pid = 12376

基于上述的Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]错误,第一反应就是可能由于底层损坏导致数据块损坏,dbv检查文件是否报错
dbv-system


检查系统日志确认异常
20250208203059

尝试拷贝文件也报错
QQ20250208-203152

已经比较明确由于底层问题,解决给问题之前,需要先对文件系统进行处理,然后再对恢复出来的数据文件恢复数据

发表在 Oracle备份恢复 | 标签为 , , , , , | 留下评论

2025年第一起ORA-600 16703故障恢复

又有一个客户数据库启动报ORA-600 16703错误
ora-600-16703


查看alert日志信息

Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc  (incident=20617):
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_20617\xff_ora_8784_i20617.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 8784): terminating the instance due to error 704
Instance terminated by USER, pid = 8784
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (8784) as a result of ORA-1092

这个故障一般是由于oracle安装介质被注入了恶意脚本,在数据库运行300天之后重启数据库就会遭遇该问题:
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
对于这个问题,我们这边经过多年的恢复经验积累和研究,可以实现直接open库并且不用做逻辑迁移,完美恢复之后业务直接使用,最大限度恢复数据和最小限度减少停机时间
以往类似恢复case汇总:
tab$恢复错误汇总
ORA-600 16703故障再现
tab$异常被处理之后报ORA-600 13304故障处理
ORA-600 16703直接把orachk备份表插入到tab$恢复
最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视
ORA-00600: internal error code, arguments: [16703], [1403], [32]
aix平台tab$被删除可能出现ORA-600 [16703], [1403], [28]错误
ORA-00600: internal error code, arguments: [16703], [1403], [4] 原因
ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理

ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922

发表在 Oracle备份恢复 | 标签为 , , , | 留下评论

_gc_undo_affinity=FALSE触发ORA-01558

最近有客户遭遇非系统回滚段报ORA-01558的故障,类似:ORA-01558: out of transaction ID’s in rollback segment _SYSSMU4_1254879796$,在之前的恢复case中遇到两次system rollback报ORA-01558而不能正常启动的案例.(ORA-01092 ORA-00604 ORA-01558故障处理ORA-01558: out of transaction ID’s in rollback segment SYSTEM),这次是业务回滚段,出来起来相对比较简单,直接重建该回滚段所在undo表空间即可.遭遇该问题的主要原因是由于19c rac中由于禁用drm,设置了_gc_undo_affinity=FALSE参数导致.
gc_undo_affinity_ora-1558


还有一个类似bug,需要注意:Bug 19700135 ORA-600 [4187] when the undo segment wrap# is close to the max value of 0xffffffff,主要影响版本为:
1

关于该bug的描述

ORA-600 [4187] can occur for undo segments where wrap# is close to the max value of 0xffffffff (KSQNMAXVAL).
This normally affects databases with high transaction rate that have existed for a relatively long time.
 
To identify undo segments causing the above error and others that may potentially cause it 
in the future, run the next query:
 
 select b.segment_name, b.tablespace_name 
         ,a.ktuxeusn "Undo Segment Number"
         ,a.ktuxeslt "Slot"
         ,a.ktuxesqn "Wrap#"
   from  x$ktuxe a, dba_rollback_segs b
   where a.ktuxesqn > -429496730 and a.ktuxesqn < 0
       and a.ktuxeusn = b.segment_id;
 
Then drop the undo segments or the undo tablespace from the output above.
 
With this fix in place an error ORA-1558 is eventually produced for the affected undo segment
which still requires dropping the undo segment:
  ORA-1558 "out of transaction ID's in rollback segment %s"
   Cause: All the available transaction id's have been used
   Action: Shutdown the instance and restart using other rollback segment(s),
                then drop the rollback segment that has no more transaction id's.
发表在 Oracle | 标签为 , , | 留下评论