又一例存储cache丢失oracle数据库恢复

10.2.0.5 hp unix rac,由于存储掉电导致cache丢失,数据库无法正常启动,客户要求我们介入处理
数据库mount报ORA-00600 kccpb_sanity_check_2错误

Thu Jul 22 14:52:06 EAT 2021
alter database mount
Thu Jul 22 14:52:10 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_4611.trc:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [4697564], [4697561], [0x000000000], [], [], [], []

该错误是由于控制文件损坏,尝试重建控制文件报ORA-01163,ORA-01517

'/dev/oradata/rxff_ls94'
CHARACTER SET ZHS16GBK
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Thu Jul 22 14:54:02 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_7283.trc:
ORA-01163: SIZE clause indicates 262144 (blocks), but should match header 204800
ORA-01517: log member: '/dev/oradata/rxff_redo1_1'
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "xff" NORESETLOGS  NOARCHIVELOG

由于redo大小错误导致该问题,设置正确的redo大小继续重建

'/dev/oradata/rxff_ls94'
CHARACTER SET ZHS16GBK
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Thu Jul 22 15:01:00 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_14737.trc:
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [32], [8], [], [], [], [], []
Thu Jul 22 15:01:01 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_14737.trc:
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [32], [8], [], [], [], [], []
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "xff" NORESETLOGS  NOARCHIVELOG

报ORA-00600 kccsga_update_ckpt_4错误,导致控制文件失败,处理该错误之后,重建控制文件成功,分析文件头信息和redo信息,确认只能强制库,尝试强制open库

Thu Jul 22 16:02:05 EAT 2021
SMON: enabling cache recovery
Thu Jul 22 16:02:05 EAT 2021
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0002.cdad19ed):
Thu Jul 22 16:02:05 EAT 2021
select ctime, mtime, stime from obj$ where obj# = :1
Thu Jul 22 16:02:05 EAT 2021
Errors in file /oracle/admin/xff/udump/xff1_ora_23219.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 19 with name "_SYSSMU19$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 23219
ORA-1092 signalled during: alter database open resetlogs...

这个问题比较常见:ORA-00704 ORA-00604 ORA-01555,参考类似文章:
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
数据库open成功但是报ORA-00600 4137

Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan 
Thu Jul 22 16:08:48 EAT 2021
Errors in file /oracle/admin/xff/bdump/xff1_smon_27436.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=30, OS id=997
Thu Jul 22 16:08:49 EAT 2021
LOGSTDBY: Validating controlfile with logical metadata
Thu Jul 22 16:08:49 EAT 2021
ORACLE Instance xff1 (pid = 11) - Error 600 encountered while recovering transaction (1, 43).
Thu Jul 22 16:08:49 EAT 2021
Errors in file /oracle/admin/xff/bdump/xff1_smon_27436.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Thu Jul 22 16:08:49 EAT 2021
Trace dumping is performing id=[cdmp_20210722160849]
Thu Jul 22 16:08:49 EAT 2021
LOGSTDBY: Validation complete
Completed: alter database open

该问题是由于undo异常,对undo进行处理,数据库无明显报错,安排导出数据

发表在 Oracle备份恢复 | 标签为 , , , | 评论关闭

ORA-01092: ORACLE 例程终止 故障恢复

数据库启动报ORA-01092: ORACLE 例程终止。强行断开连接 错误

SQL> RECOVER DATABASE;
完成介质恢复。
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR 位于第 1 行:
ORA-01092: ORACLE 例程终止。强行断开连接

查看alert日志

Wed Jul 21 12:32:04 2021
SMON: enabling cache recovery
Wed Jul 21 12:32:04 2021
Errors in file c:\oracle\admin\dcpdm\udump\dcpdm_ora_3004.trc:
ORA-00600: ?????????: [4194], [34], [8], [], [], [], [], []

Wed Jul 21 12:32:05 2021
Recovery of Online Redo Log: Thread 1 Group 2 Seq 495 Reading mem 0
  Mem# 0 errs 0: C:\ORACLE\ORADATA\DCPDM\REDO02.LOG
Recovery of Online Redo Log: Thread 1 Group 2 Seq 495 Reading mem 0
  Mem# 0 errs 0: C:\ORACLE\ORADATA\DCPDM\REDO02.LOG
Wed Jul 21 12:32:05 2021
Errors in file c:\oracle\admin\dcpdm\udump\dcpdm_ora_3004.trc:
ORA-00604: ?? SQL ? 1 ????
ORA-00607: ?????????????
ORA-00600: ?????????: [4194], [34], [8], [], [], [], [], []

Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Wed Jul 21 12:32:05 2021
Errors in file c:\oracle\admin\dcpdm\bdump\dcpdm_pmon_13020.trc:
ORA-00604: error occurred at recursive SQL level 

Instance terminated by USER, pid = 3004
ORA-1092 signalled during: ALTER DATABASE OPEN...

trace文件信息

*** 2021-07-21 12:32:04.000
ksedmp: internal or fatal error
ORA-00600: ?????????: [4194], [34], [8], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
_ksedmp+147          CALLrel  _ksedst+0            
_ksfdmp.108+e        CALLrel  _ksedmp+0            3
_kgeriv+89           CALLreg  00000000             4E59D98 3
_kseipre.107+3f      CALLrel  _kgeriv+0            
_ksesic2+24          CALLrel  _kseipre.107+0       
__VInfreq__kturdb+8  CALLrel  _ksesic2+0           1062 0 22 0 8
b                                                  
_kcoapl+1df          CALLreg  00000000             2BB0F94 2BB100A 11 6C37C014
_kcbapl+71           CALLrel  _kcoapl+0            2BB0F90 6C37C000 1 0 2000
_kcrfwr+734          CALLrel  _kcbapl+0            2BB0F90 6C3FC788 50D4FA0
_kcbchg1+7ec         CALLrel  _kcrfwr+0            
_ktuchg+630          CALLrel  _kcbchg1+0           0 4 50D5228 50D5240 0 0
_ktbchg2+75          CALLrel  _ktuchg+0            2 66F589A4 1 2C8CD14 2C8CD1C
                                                   2BB0F90 2C8C32C 2BB0ED0 0 0
_kddchg+18f          CALLrel  _ktbchg2+0           0 66F589A4 2C8CD14 2C8CD1C
                                                   2BB0F90 2C8C324 2BB0ED0 0 0
_kduovw.53+6e3       CALLrel  _kddchg+0            2C8C2E8 2C8CD14 2C8CD1C
                                                   2BB0F90 2BB0ED0 0 0
_kduurp.53+61a       CALLrel  _kduovw.53+0         2C8C2E8
_kdusru+aa5          CALLrel  _kduurp.53+0         2C8C2E8 66F589FC
_kauupd+12e          CALLrel  _kdusru+0            2C8C71C 66F589FC 2C8C2E8 0
_updrow+729          CALLrel  _kauupd+0            2C8C718 66F589FC 2C8C2E8 0
                                                   66F58448 E F 66F60EE0 12
                                                   50DBBA4 50DBBA8
_qerupFetch+107      CALLrel  _updrow+0            
_updaul+202          CALL???  00000000             66F58660 0 66F6BC3C 7FFF
_updThreePhaseExe+b  CALLrel  _updaul+0            66F6B9D0 50DBD34 0
6                                                  
_updexe+105          CALLrel  _updThreePhaseExe+0  66F6B9D0 0 2C8C2E8 50DBE10
                                                   66F6B9D0 1 50DBE10 0
_opiexe+f97          CALLrel  _updexe+0            66F6B9D0 50DBF4C
_opiodr+4cd          CALLreg  00000000             4 3 50DC898
_rpidrus.43+99       CALLrel  _opiodr+0            4 3 50DC898 A
_skgmstack+71        CALLreg  00000000             50DC488
_rpidru+6d           CALLrel  _skgmstack+0         50DC4A0 4E59C20 F618 778198
                                                   50DC488
_rpiswu2+17e         CALLreg  00000000             50DC7C0
_rpidrv+109          CALLrel  _rpiswu2+0           
_rpiexe+33           CALLrel  _rpidrv+0            A 4 50DC898 8
_ktuscu+2a8          CALLrel  _rpiexe+0            A
_kqrcmt+2c2          CALL???  00000000             66F6D654 3
..1.18_2.filter.95+  CALLrel  _kqrcmt+0            67B88CD4 1 0 4E59D98 4E59D98
159                                                FF 0 0 0
..1.23_5.filter.99+  CALLrel  _ktcrcm+0            67B88CD4 0 0 0 0 1 0 0
14d                                                
_ktuini+64           CALLrel  _ktuiup.99+0         50DD994
_adbdrv+2665         CALLrel  _ktuini+0            50DD994
..1.5_1.filter.29+2  CALLrel  _adbdrv+0            
9d                                                 
_opiosq0+9a4         CALLrel  _opiexe+0            4 0 50DDDDC
_kpooprx+c6          CALLrel  _opiosq0+0           3 E 50DDE74 24
_kpoal8+225          CALLrel  _kpooprx+0           50DE73C 50DE684 13 1 0 24
_opiodr+4cd          CALLreg  00000000             5E 14 50DE738
_ttcpip+a86          CALLreg  00000000             5E 14 50DE738 0
_opitsk+2f4          CALLrel  _ttcpip+0            
_opiino+5fc          CALLrel  _opitsk+0            0 0 4E5FEE8 2BDF044 F3 0
_opiodr+4cd          CALLreg  00000000             3C 4 50DFBD8
_opidrv+233          CALLrel  _opiodr+0            3C 4 50DFBD8 0
_sou2o+19            CALLrel  _opidrv+0            
_opimai+10a          CALLrel  _sou2o+0             
_OracleThreadStart@  CALLrel  _opimai+0            
4+35c                                              
7C824826             CALLreg  00000000             
 
--------------------- Binary Stack Dump ---------------------

比较明显时候由于在更新undo$的时候需要找前镜像信息

Block image after block recovery:
buffer tsn: 0 rdba: 0x0040018b (1/395)
scn: 0x0000.07d52871 seq: 0x01 flg: 0x04 tail: 0x28710201
frmt: 0x02 chkval: 0xc85e type: 0x02=KTU UNDO BLOCK
 
********************************************************************************
UNDO BLK:  
xid: 0x0000.05a.0000002d  seq: 0x33  cnt: 0x22  irb: 0x22  icl: 0x0   flg: 0x0000
 
 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f04     0x02 0x1e20     0x03 0x1d3c     0x04 0x1c58     0x05 0x1b74     
0x06 0x1a90     0x07 0x19ac     0x08 0x18c8     0x09 0x17e4     0x0a 0x1700     
0x0b 0x161c     0x0c 0x1538     0x0d 0x1454     0x0e 0x1370     0x0f 0x128c     
0x10 0x11a8     0x11 0x10c4     0x12 0x0fe0     0x13 0x0efc     0x14 0x0e18     
0x15 0x0d34     0x16 0x0c50     0x17 0x0b6c     0x18 0x0a88     0x19 0x09a4     
0x1a 0x08c0     0x1b 0x07dc     0x1c 0x06f8     0x1d 0x0614     0x1e 0x0530     
0x1f 0x044c     0x20 0x0368     0x21 0x0284     0x22 0x01a0     
 
*-----------------------------
* Rec #0x1  slt: 0x0b  objn: 15(0x0000000f)  objd: 15  tblspc: 0(0x00000000)
*       Layer:  11 (Row)   opc: 1   rci 0x00   
Undo type:  Regular undo    Begin trans    Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
uba: 0x0040018a.0033.22 ctl max scn: 0x0000.07853941 prv tx scn: 0x0000.07853943
KDO undo record:
KTB Redo 
op: 0x04  ver: 0x01  
op: L  itl: xid:  0x0000.042.0000002d uba: 0x0040018a.0033.22
                      flg: C---    lkc:  0     scn: 0x0000.07d23460
KDO Op code: URP row dependencies Disabled
  xtype: XA  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 7(0x7) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 37 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 3]  c2 02 06
col  5: [ 6]  c5 02 20 14 40 24
col  6: [ 1]  80
col  7: [ 4]  c3 0e 21 2d
col  8: [ 3]  c2 1b 34
col  9: [ 1]  80
col 10: [ 2]  c1 03
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02

这部分信息异常,导致数据库update undo$的时候报ORA-00600: ?????????: [4194], [34], [8], [], [], [], [], []错误,通过修改对应的block信息,数据库正常open成功

SQL> alter database open;

数据库已更改。

但是关闭数据库又报ORA-600 4194错误

SQL> shutdown immediate;
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码,参数: [4194], [94], [61], [], [], [], [], []

alert日志信息

Wed Jul 21 12:58:42 2021
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 3
Waiting for dispatcher 'D000' to shutdown
All dispatchers and shared servers shutdown
Wed Jul 21 12:58:45 2021
ALTER DATABASE CLOSE NORMAL
Wed Jul 21 12:58:45 2021
Errors in file c:\oracle\admin\dcpdm\udump\dcpdm_ora_13628.trc:
ORA-00600: 内部错误代码,参数: [4194], [94], [61], [], [], [], [], []

Recovery of Online Redo Log: Thread 1 Group 3 Seq 496 Reading mem 0
  Mem# 0 errs 0: C:\ORACLE\ORADATA\DCPDM\REDO03.LOG
Recovery of Online Redo Log: Thread 1 Group 3 Seq 496 Reading mem 0
  Mem# 0 errs 0: C:\ORACLE\ORADATA\DCPDM\REDO03.LOG
ORA-607 signalled during: ALTER DATABASE CLOSE NORMAL...

通过重建undo,数据库启动关闭正常,也没有再报其他错误,建议逻辑方式重建库
参考以前的类似文章:
数据库报ORA-00607/ORA-00600[4194]错误
使用bbed解决ORA-00607/ORA-00600[4194]故障
使用bbed解决ORA-00607/ORA-00600[4194]故障

发表在 Oracle备份恢复 | 标签为 , , | 评论关闭

ora-600 kfdpMetaBlk_pickle 故障处理

客户反馈集群的crs无法正常启动观察发现是由于gmon进程crash asm实例导致,经过测试确认是在mount data磁盘组的时候会触发给问题

SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 7517
Session ID: 918 Serial number: 5

对应的alert日志报ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误

SQL> alter diskgroup data mount
NOTE: cache registered group DATA number=2 incarn=0x3078f05f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x3078f05f
NOTE: Assigning number (2,1) to disk (/dev/rdisk/disk93)
NOTE: Assigning number (2,3) to disk (/dev/rdisk/disk96)
NOTE: Assigning number (2,2) to disk (/dev/rdisk/disk94)
NOTE: Assigning number (2,0) to disk (/dev/rdisk/disk92)
Sat Jul 17 05:21:01 2021
Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc  (incident=255833):
ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/crs_base/diag/asm/+asm/+ASM2/incident/incdir_255833/+ASM2_gmon_7457_i255833.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc:
ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []
GMON (ospid: 7457): terminating the instance due to error 493
Sat Jul 17 05:21:03 2021
System state dump requested by (instance=2, osid=7457 (GMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_diag_7429.trc
Instance terminated by GMON, pid = 7457

对于ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误,查询了mos没有任何有效信息
kfdpMetaBlk_pickle


对应的trace文件发现如下信息

2021-07-17 03:51:16.277603*:800002A2:KGF:kgfdputl.c@1411:kgfdpMetaSet_getMaxClique():   inc=2 ver=4294967295
2021-07-17 03:51:16.277619 :800002A3:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2
2021-07-17 03:51:16.277620 :800002A4:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 2
2021-07-17 03:51:16.277992 :800002A5:KFDP:kfdp.c@9417:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle upto 6 metablks
2021-07-17 03:51:16.277993 :800002A6:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 3
2021-07-17 03:51:16.278154 :800002A7:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 1
2021-07-17 03:51:16.278268 :800002A8:KFDP:kfdp.c@5851:kfdp_read(): kfdp_read end ok=1
2021-07-17 03:51:16.278277 :800002A9:KFDP:kfdp.c@7073:kfdp_doQuery(): kfdp_doQuery   rewrite_kfdp=1
2021-07-17 03:51:16.278282 :800002AA:KFDP:kfdp.c@12511:kfdpLckValue_pickle(): kfdpLckValue_pickle size=0 
                            endian=0xff ndisks=0 lckvalid=0
2021-07-17 03:51:16.278293 :800002AB:db_trace:kfdp.c@12803:kfdpLck_convPriv(): [10499:19:396] 
                            kfdpLck_conv: grp=1, type=0, mode=5, line=7155
2021-07-17 03:51:16.278294 :800002AC:KFDP:kfdp.c@12663:kfdpLckValue_unpickle(): kfdpLckValue_unpickle
                            size=28 res=0 ok=0 ver=-1 dcnt=0 lckvalid=0 flags=0x2 inst=0 (I am 2) version=0
2021-07-17 03:51:16.278499*:800002AD:KGF:kgfdputl.c@485:kgfdpDta_getAllDsks(): kgfdpDta_getAllDsks using 
                            saved iterator 0x9ffffffffd571220 with 4 disks
2021-07-17 03:51:16.278688 :800002AE:KFDP:kfdp.c@5566:kfdp_write(): kfdp_write: pstDskCnt=3 grow=0 degenerate=0
2021-07-17 03:51:16.278688*:800002AF:KGF:kgfdputl.c@2619:kgfdpTraceSet(): writing pst to disks (n=3): 0 1 3

通过删除信息,基本上可以确认由于pst信息异常(pst中记录的只有0 1 3三个磁盘,认为2是老磁盘),但是实际磁盘为4个,导致gmon进程异常.通过底层解决该问题,数据库恢复成功

SQL> recover database using backup controlfile;
ORA-00279: change 30075814973 generated at 07/17/2021 01:12:08 needed for
thread 2
ORA-00289: suggestion : +FRA
ORA-00280: change 30075814973 for thread 2 is in sequence #120561


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_16
ORA-00279: change 30075814973 generated at 07/17/2021 01:11:54 needed for
thread 1
ORA-00289: suggestion :
+FRA/xff/archivelog/2021_07_17/thread_1_seq_79949.1543.1078103529
ORA-00280: change 30075814973 for thread 1 is in sequence #79949


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_13
ORA-00279: change 30075815013 generated at 07/17/2021 01:12:09 needed for
thread 1
ORA-00289: suggestion : +FRA
ORA-00280: change 30075815013 for thread 1 is in sequence #79950
ORA-00278: log file '/tmp/asm/group_13' no longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/tmp/asm/group_11
Log applied.
Media recovery complete.

SQL> alter database open resetlogs;

Database altered.

运气不错,对于该故障的恢复,实现数据0丢失.

发表在 Oracle备份恢复 | 标签为 , | 一条评论