2025年首个故障恢复—ORA-600 kcbzib_kcrsds_1

一个12.2.0.1的库由于某种原因引起的双机切换,导致数据库无法正常mount

2025-01-04T15:45:44.424193+08:00
alter database mount
2025-01-04T15:45:48.491054+08:00
Network throttle feature is disabled as mount time

2025-01-04T15:45:48.601366+08:00
LGWR (ospid: 34014): terminating the instance
2025-01-04T15:45:48.602480+08:00
System state dump requested by (instance=1, osid=34014 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xifenfei/trace/xifenfei_diag_33978_20250104154548.trc
2025-01-04T15:45:48.790674+08:00
Dumping diagnostic data in directory=[cdmp_20250104154548], requested by (instance=1, osid=34014 (LGWR))
2025-01-04T15:45:49.915068+08:00
Instance terminated by LGWR, pid = 34014

这个错误相对比较明显,是由于ctl异常导致,通过重建ctl,然后mount库,利用Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本进行检测发现所有数据文件头的checkpoint 信息被冻结在 2024-11-29 19:00:29 (scn 2112302221)
begin-backup


分析alert日志数据库在此后20天中正常提供服务,业务运行都正常,客户反馈在这个冻结checkpoint信息的时间点,使用备份一体机发起过备份,之后就没有再备份了.
当时急着恢复数据库,没有对文件头进行dump不然应该可以发现类似begin backup的信息,类似这样(测试环境重现):

DATA FILE #1:
  name #7: /u01/app/oracle/oradata/xifenfei/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:625 scn: 0x0105.0106deef 01/04/2025 22:02:50
 Stop scn: 0xffff.ffffffff 12/14/2024 08:15:07
 Creation Checkpointed at scn:  0x0000.00000007 08/24/2013 11:37:33
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Offline scn: 0x0000.000e2005 prev_range: 0
 Online Checkpointed at scn:  0x0000.000e2006 03/20/2024 20:53:56
 thread:1 rba:(0x1.2.0)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED
 Plugged readony: NO
 Plugin scnscn: 0x0000.00000000
 Plugin resetlogs scn/timescn: 0x0000.00000000 01/01/1988 00:00:00
 Foreign creation scn/timescn: 0x0000.00000000 01/01/1988 00:00:00
 Foreign checkpoint scn/timescn: 0x0000.00000000 01/01/1988 00:00:00
 Online move state: 0
 V10 STYLE FILE HEADER:
        Compatibility Vsn = 186647552=0xb200400
        Db ID=1780931490=0x6a26dba2, Db Name='XIFENFEI'
        Activation ID=0=0x0
        Control Seq=32953021=0x1f6d2bd, File size=98560=0x18100
        File Number=1, Blksiz=8192, File Type=3 DATA
Tablespace #0 - SYSTEM  rel_fn:1
Creation   at   scn: 0x0000.00000007 08/24/2013 11:37:33
Backup taken at scn: 0x0105.0106deef 01/04/2025 22:02:50 thread:1    <====注意
 reset logs count:0x45636764 scn: 0x0000.000e2006
 prev reset logs count:0x3121c97a scn: 0x0000.00000001
 recovered at 12/14/2024 08:36:35
 status:0x2001 root dba:0x00400208 chkpt cnt: 625 ctl cnt:624
begin-hot-backup file size: 98560                        <====注意
Checkpointed at scn:  0x0105.0106deef 01/04/2025 22:02:50
 thread:1 rba:(0x205.fdd9.10)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
Backup Checkpointed at scn:  0x0105.0106df14 01/04/2025 22:03:20   <====注意
 thread:1 rba:(0x209.2.10)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
External cache id: 0x0 0x0 0x0 0x0
Absolute fuzzy scn: 0x0000.00000000
Recovery fuzzy scn: 0x0000.00000000 01/01/1988 00:00:00
Terminal Recovery Stamp  01/01/1988 00:00:00
Platform Information:    Creation Platform ID: 13
Current Platform ID: 13 Last Platform ID: 13

基于上述情况,尝试强制打开库,报ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1]错误
ora-600 kzbzib_kcrsds_1


对于这个情况,以前有过大量恢复案例,修改数据库scn即可
kcbzib_kcrsds_1报错汇总
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
此类故障处理太多,不一一列举,解决这个错误之后,数据库open成功,然后安排逻辑迁移即可

发表在 Oracle备份恢复 | 标签为 , | 评论关闭

第一例Oracle 21c恢复咨询

记录一个Oracle 21c故障的恢复请求(这个是第一个21c的恢复咨询),这个表明21C确实有客户在生产上使用了(不过这个是国外客户,国内的目前还没有遇到)
21c


故障原因是最初的数据文件不一致,数据库无法open,最终经过一系列折腾之后,有数据文件offline的情况下执行了resetlogs,导致部分文件resetlogs scn不一致
wrong-resetlogs

发表在 Oracle | 评论关闭

ORA-15411: Failure groups in disk group DATA have different number of disks.

客户磁盘组以前规划是normal模式,但是由于某种原因,其中一个存储掉线了,出现一下状态

SQL> select group_number,name,path,failgroup,state from v$asm_disk;

GROUP_NUMBER NAME                           PATH                           FAILGROUP                      STATE
------------ ------------------------------ ------------------------------ ------------------------------ --------
           0                                /dev/asmocr1                                                  NORMAL
           0                                /dev/asmocr3                                                  NORMAL
           0                                /dev/asmhdisk15                                               NORMAL
           0                                /dev/asmocr2                                                  NORMAL
           1 DATA_0011                                                     FAL1                           NORMAL
           1 DATA_0010                                                     FAL1                           NORMAL
           1 DATA_0013                                                     FAL1                           NORMAL
           1 DATA_0012                                                     FAL1                           NORMAL
           1 DATA_0009                                                     FAL1                           NORMAL
           1 DATA_0008                                                     FAL1                           NORMAL
           1 DATA_0007                                                     FAL1                           NORMAL
           1 DATA_0006                                                     FAL1                           NORMAL
           1 DATA_0005                                                     FAL1                           NORMAL
           1 DATA_0004                                                     FAL1                           NORMAL
           1 DATA_0003                                                     FAL1                           NORMAL
           1 DATA_0002                                                     FAL1                           NORMAL
           1 DATA_0001                                                     FAL1                           NORMAL
           1 DATA_0000                                                     FAL1                           NORMAL
           1 DATA_0023                      /dev/asmhdisk5                 FAL2                           NORMAL
           1 DATA_0024                      /dev/asmhdisk6                 FAL2                           NORMAL
           1 DATA_0022                      /dev/asmhdisk4                 FAL2                           NORMAL
           1 DATA_0020                      /dev/asmhdisk2                 FAL2                           NORMAL
           1 DATA_0014                      /dev/asmhdisk1                 FAL2                           NORMAL
           1 DATA_0021                      /dev/asmhdisk3                 FAL2                           NORMAL
           1 DATA_0018                      /dev/asmhdisk13                FAL2                           NORMAL
           1 DATA_0019                      /dev/asmhdisk14                FAL2                           NORMAL
           1 DATA_0017                      /dev/asmhdisk12                FAL2                           NORMAL
           1 DATA_0016                      /dev/asmhdisk11                FAL2                           NORMAL
           1 DATA_0027                      /dev/asmhdisk9                 FAL2                           NORMAL
           1 DATA_0015                      /dev/asmhdisk10                FAL2                           NORMAL
           1 DATA_0025                      /dev/asmhdisk7                 FAL2                           NORMAL
           1 DATA_0026                      /dev/asmhdisk8                 FAL2                           NORMAL
           2 OCRVOTE2                       AFD:OCRVOTE2                   OCRVOTE2                       NORMAL
           2 OCRVOTE1                       AFD:OCRVOTE1                   OCRVOTE1                       NORMAL
           2 OCRVOTE3                       AFD:OCRVOTE3                   OCRVOTE3                       NORMAL

35 rows selected.

因为磁盘空闲空间较大

ASMCMD> lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512             512   4096  4194304  29360128  23110032          2097152        10506440             14             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  4194304     92160     91724                0           91724              0             Y  OCRVOTE/

想从data磁盘组中,删除部分盘,释放出来一些空间,结果报ORA-15411: Failure groups in disk group DATA have different number of disks.

SQL> alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10;
alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15411: Failure groups in disk group DATA have different number of disks.

设置,删除磁盘成功_asm_disable_failgroup_size_checking和_asm_disable_dangerous_failgroup_checking

SQL> alter system set "_asm_disable_failgroup_size_checking"=true scope=memory sid='*';

System altered.

SQL>alter system set "_asm_disable_dangerous_failgroup_checking"=true scope=memory sid='*';

System altered.

SQL> alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10;

Diskgroup altered.
发表在 Oracle ASM | 标签为 , , | 评论关闭