ORA-604 ORA-607 ORA-600

有客户数据库经过一系列恢复之后,出现ORA-604 ORA-607 ORA-600的错误,尝试各种方法无法打开,希望我们介入处理

Sat May 12 21:18:56 2018
SMON: enabling cache recovery
Sat May 12 21:18:57 2018
Errors in file d:\oracle\admin\xifenfei\udump\xifenfei_ora_3448.trc:
ORA-00600: 内部错误代码,参数: [4194], [34], [28], [], [], [], [], []

Sat May 12 21:19:00 2018
Recovery of Online Redo Log: Thread 1 Group 1 Seq 644 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\xifenfei\REDO01.LOG
Recovery of Online Redo Log: Thread 1 Group 1 Seq 644 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\xifenfei\REDO01.LOG
Sat May 12 21:19:01 2018
Errors in file d:\oracle\admin\xifenfei\udump\xifenfei_ora_3448.trc:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码,参数: [4194], [34], [28], [], [], [], [], []

Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Sat May 12 21:19:02 2018
Errors in file d:\oracle\admin\xifenfei\bdump\xifenfei_pmon_1840.trc:
ORA-00604: error occurred at recursive SQL level 

Instance terminated by USER, pid = 3448
ORA-1092 signalled during: alter database open...

ORA-600 4194 trace文件
分析trace文件,确定ORA-600 4194对应的sql语句为update undo$ set name=:2……

*** 2018-05-12 21:18:57.000
ksedmp: internal or fatal error
ORA-00600: 内部错误代码,参数: [4194], [34], [28], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
_ksedmp+147          CALLrel  _ksedst+0            
_ksfdmp.108+e        CALLrel  _ksedmp+0            3
_kgeriv+89           CALLreg  00000000             212778 3
_kseipre.107+3f      CALLrel  _kgeriv+0            
_ksesic2+24          CALLrel  _kseipre.107+0       
__VInfreq__kturdb+8  CALLrel  _ksesic2+0           1062 0 22 0 1C
b                                                  
_kcoapl+1df          CALLreg  00000000             30A0F94 30A100A 11 6BB7E014
_kcbapl+71           CALLrel  _kcoapl+0            30A0F90 6BB7E000 1 0 2000
_kcrfwr+734          CALLrel  _kcbapl+0            30A0F90 6BBFC844 3014F9C
_kcbchg1+7ec         CALLrel  _kcrfwr+0            
_ktuchg+630          CALLrel  _kcbchg1+0           0 4 3015224 301523C 0 0
_ktbchg2+75          CALLrel  _ktuchg+0            2 672D50F4 1 310DCD8 310DCE0
                                                   30A0F90 310D2F0 30A0ED0 0 0
_kddchg+18f          CALLrel  _ktbchg2+0           0 672D50F4 310DCD8 310DCE0
                                                   30A0F90 310D2E8 30A0ED0 0 0
_kduovw.53+6e3       CALLrel  _kddchg+0            310D2AC 310DCD8 310DCE0
                                                   30A0F90 30A0ED0 0 0
_kduurp.53+61a       CALLrel  _kduovw.53+0         310D2AC
_kdusru+aa5          CALLrel  _kduurp.53+0         310D2AC 672D514C
_kauupd+12e          CALLrel  _kdusru+0            310D6E0 672D514C 310D2AC 0
_updrow+729          CALLrel  _kauupd+0            310D6DC 672D514C 310D2AC 0
                                                   672D539C E F 672E4E48 12
                                                   301BBA0 301BBA4
_qerupFetch+107      CALLrel  _updrow+0            
_updaul+202          CALL???  00000000             672DE9C4 0 672EC040 7FFF
_updThreePhaseExe+b  CALLrel  _updaul+0            672EBDD4 301BD30 0
6                                                  
_updexe+105          CALLrel  _updThreePhaseExe+0  672EBDD4 0 310D2AC 301BE0C
                                                   672EBDD4 1 301BE0C 0
_opiexe+f97          CALLrel  _updexe+0            672EBDD4 301BF48
_opiodr+4cd          CALLreg  00000000             4 3 301C894
_rpidrus.43+99       CALLrel  _opiodr+0            4 3 301C894 B
_skgmstack+71        CALLreg  00000000             301C484
_rpidru+6d           CALLrel  _skgmstack+0         301C49C 212600 F618 778198
                                                   301C484
_rpiswu2+17e         CALLreg  00000000             301C7BC
_rpidrv+109          CALLrel  _rpiswu2+0           
_rpiexe+33           CALLrel  _rpidrv+0            B 4 301C894 8
_ktuscu+2a8          CALLrel  _rpiexe+0            B
_kqrcmt+2c2          CALL???  00000000             672EA898 3
..1.18_2.filter.95+  CALLrel  _kqrcmt+0            67B9C5F4 1 0 212778 212778 FF
159                                                0 0 0
..1.23_5.filter.99+  CALLrel  _ktcrcm+0            67B9C5F4 0 0 0 0 1 0 0
14d                                                
_ktuini+64           CALLrel  _ktuiup.99+0         301D990
_adbdrv+2665         CALLrel  _ktuini+0            301D990
..1.5_1.filter.29+2  CALLrel  _adbdrv+0            
9d                                                 
_opiosq0+9a4         CALLrel  _opiexe+0            4 0 301DDD8
_kpooprx+c6          CALLrel  _opiosq0+0           3 E 301DE70 24
_kpoal8+225          CALLrel  _kpooprx+0           301E738 301E680 13 1 0 24
_opiodr+4cd          CALLreg  00000000             5E 14 301E734
_ttcpip+a86          CALLreg  00000000             5E 14 301E734 0
_opitsk+2f4          CALLrel  _ttcpip+0            
_opiino+5fc          CALLrel  _opitsk+0            0 0 2188C8 30CF020 E6 0
_opiodr+4cd          CALLreg  00000000             3C 4 301FBD4
_opidrv+233          CALLrel  _opiodr+0            3C 4 301FBD4 0
_sou2o+19            CALLrel  _opidrv+0            
_opimai+10a          CALLrel  _sou2o+0             
_OracleThreadStart@  CALLrel  _opimai+0            
4+35c                                              
7C80B726             CALLreg  00000000             
 
--------------------- Binary Stack Dump ---------------------

进一步分析确定为system rollback segment header 异常

Block image after block recovery:
buffer tsn: 0 rdba: 0x00400009 (1/9)
scn: 0x0000.d794070f seq: 0x01 flg: 0x04 tail: 0x070f0e01
frmt: 0x02 chkval: 0x2320 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS
  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 6      #blocks: 47    
                  last map  0x00000000  #maps: 0      offset: 4128  
      Highwater::  0x00400183  ext#: 2      blk#: 2      ext size: 8     
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 0     
  mapblk  0x00000000  offset: 2     
                   Unlocked
     Map Header:: next  0x00000000  #extents: 6    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x0040000a  length: 7     
   0x00400011  length: 8     
   0x00400181  length: 8     
   0x00400189  length: 8     
   0x00400191  length: 8     
   0x00400199  length: 8     
  
  TRN CTL:: seq: 0x0056 chd: 0x0054 ctl: 0x0052 inc: 0x00000000 nfb: 0x0001
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00400183.0056.1b scn: 0x0000.d77996ab
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00400183.0056.1b ext: 0x2  spc: 0x794   
    uba: 0x00000000.002f.21 ext: 0x5  spc: 0x1334  
    uba: 0x00000000.002e.37 ext: 0x4  spc: 0x788   
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0 
…………    
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码,参数: [4194], [34], [28], [], [], [], [], []

这种问题需要通过通过bbed/ue修改ktuxc的相关内容,实现数据库open成功,可以参考另外几篇文章:
使用bbed解决ORA-00607/ORA-00600[4194]故障
通过bbed模拟ORA-00607/ORA-00600 4194 故障
ORA-607/ORA-600[4194]不一定是重大灾难
数据库报ORA-00607/ORA-00600[4194]错误

发表在 Oracle备份恢复 | 标签为 , | 评论关闭

Automatic datafile offline due to write error on

由于存储突然掉线导致数据文件无法访问,导致部分数据文件被自动offline

Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0xe93b8 online=1
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x24eb65 online=1
        file=25 'F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA'
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 25: F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x22b0a1 online=1
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 28: F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x138def online=1
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
        file=30 'F:\ORACLE\ORADATA\ORCL\QYSCZH17.ORA'
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
O/S-Error: (OS 2) 系统找不到指定的文件。'
……
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0xade96 online=1
        file=9 'F:\ORACLE\ORADATA\ORCL\QYSCZH2'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 9: F:\ORACLE\ORADATA\ORCL\QYSCZH2
Thu May 17 14:49:28 2018
KCF: read, write or open error, block=0x378c66 online=1
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0x35f6de online=1
……

存储掉线是悲剧的起点,按理说数据库是归档模式,存储恢复之后,继续recover datafile,然后online应该问题不大,但是由于客户没有及时处理这个问题(也许业务实时性要求不高,可能挂几个小时也没人知道),导致第二个悲剧发生,删除归档的定时任务把数据库的归档日志给删除了.导致后面存储挂载上来之后,数据文件也无法正常online成功

Tue May 22 16:28:13 2018
ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'  
Media Recovery Start
Serial Media Recovery started
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'  ...
Tue May 22 16:28:42 2018
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测发现结果如下:
recover


遭遇这种情况,常规方法无法恢复,考虑使用bbed或者其他方法强制online文件,由于存储突然掉线,这样恢复的库可能后续还有大量工作需要处理,最常见的可能有表和index不一致,表的segment header信息和extent实际信息不匹配等

发表在 非常规恢复 | 标签为 , , , | 评论关闭

ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed]

有一朋友数据库经常crash,让我帮忙分析和解决该问题
数据库版本

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

alert日志报错信息

Mon Apr 23 02:14:18 2018
Process 0x0x10f33262f8 appears to be hung while dumping
Current time = 464149508, process death time = 464089392 interval = 60000
Called from location UNKNOWN:UNKNOWN
Attempting to kill process 0x0x10f33262f8 with OS pid = 30813
OSD kill succeeded for process 0x10f33262f8
Instance Critical Process (pid: 9, ospid: 30813, DBRM) died unexpectedly
Mon Apr 23 02:14:21 2018
System state dump requested by (instance=1, osid=30789 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_30809_20180423021421.trc
Mon Apr 23 02:14:22 2018
PMON (ospid: 30789): terminating the instance due to error 56710
Mon Apr 23 02:14:22 2018
opiodr aborting process unknown ospid (27086) as a result of ORA-1092
Mon Apr 23 02:14:28 2018
Instance terminated by PMON, pid = 30789

而在类似报错之前,一般有swap不足的报错

Mon Apr 23 02:03:54 2018
WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [2.01%] pct of memory swapped out [0.51%].
Please make sure there is no memory pressure and the SGA and PGA 
are configured correctly. Look at DBRM trace file for more details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbrm_30813.trc  (incident=854536):
ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_854536/orcl_dbrm_30813_i854536.trc
Mon Apr 23 02:04:02 2018
Dumping diagnostic data in directory=[cdmp_20180423020402], requested by (instance=1, osid=30813 (DBRM))

从这里报错看,由于系统内存不足,导致大量使用swap,从而引起oracle进程被kill

分析系统内存使用情况

[www.xifenfei.com@Oracle ~]$ more /proc/meminfo
MemTotal:       66109924 kB
MemFree:          359848 kB
Buffers:            9308 kB
Cached:          1848504 kB
SwapCached:       172800 kB
Active:          1060368 kB
Inactive:        1156100 kB
Active(anon):     999208 kB
Inactive(anon):  1104860 kB
Active(file):      61160 kB
Inactive(file):    51240 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      33554428 kB
SwapFree:       30516280 kB
Dirty:                68 kB
Writeback:             0 kB
AnonPages:        190936 kB
Mapped:          1152196 kB
Shmem:           1745380 kB
Slab:              70900 kB
SReclaimable:      25640 kB
SUnreclaim:        45260 kB
KernelStack:        5728 kB
PageTables:        92488 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    35152108 kB
Committed_AS:   70923356 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      267468 kB
VmallocChunk:   34359442996 kB
HardwareCorrupted:     0 kB
AnonHugePages:     18432 kB
HugePages_Total:   30720
HugePages_Free:    30720
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8192 kB
DirectMap2M:     2088960 kB
DirectMap1G:    65011712 kB

[www.xifenfei.com@Oracle ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:         64560      64200        359       1682          9       1801
-/+ buffers/cache:      62389       2170
Swap:        32767       2977      29790

比较明显系统总共内存64G,配置了60G大页,但是数据库没有使用该大页

数据库使用内存情况

SQL> show sga;      

Total System Global Area 7.1672E+10 bytes
Fixed Size                  3719544 bytes
Variable Size            2684358280 bytes
Database Buffers         6.8719E+10 bytes
Redo Buffers              264712192 bytes

比较明显按照上述配置,一共就只有4G的空闲内存,但是oracle sga占用7G,出现大量换页是必然.错误也明显想让数据库使用大页,但是由于配置不当导致数据库无法使用大页而使用系统除大页之外的内存,从而引起系统异常.
这里也说明12c的提示有明显的改善,通过alert的错误提示基本上就可以确定是swap不足导致.

发表在 ORACLE 12C | 标签为 , | 评论关闭