年归档:2022

Oracle Recovery Tools实战批量坏块修复

有客户数据库无法正常启动ORA-600 6711错误

SQL> startup mount pfile='d:/pfile.txt'
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898],
[0], [], [], [], [], [], [], []
Process ID: 22708
Session ID: 978 Serial number: 56675

alert日志报错

2022-06-26T12:34:41.855326+08:00
alter database open
2022-06-26T12:34:41.984974+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
Undo initialization finished serial:0 start:313418906 end:313418906 diff:0 ms (0.0 seconds)
Database Characterset is ZHS16GBK
No Resource Manager plan active
2022-06-26T12:34:43.302315+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38629):
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38629\ora19c_ora_22708_i38629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-06-26T12:34:44.232115+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38630):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38630\ora19c_ora_22708_i38630.trc
2022-06-26T12:34:45.266678+08:00
opiodr aborting process unknown ospid (22708) as a result of ORA-603
2022-06-26T12:34:45.274688+08:00
ORA-603 : opitsk aborting process
License high water mark = 1
USER (ospid: (prelim)): terminating the instance due to ORA error 

通过分析trace文件进行分析,确认是由于histgrm$表异常导致,通过一些特殊处理,绕过该表相关sql,open数据库,并且尝试导出数据

SQL> startup mount pfile='d:/pfile.txt';
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open;

Database altered.

使用expdp导出数据报ORA-01578错
expdp-ora-1578


通过分析是由于system有坏块导致,dbv检查文件
dbv-huikuai

通过Oracle Recovery Tools工具批量坏块修复功能修复
20220626123245
20220626123343

通过工具修复大量主要坏块被修复,还有一些内部逻辑错误(后续工具继续完善),再次尝试逻辑导出数据,无任何报错,数据比较完美恢复
20220626160209

发表在 小工具, 非常规恢复 | 标签为 , , , , | 评论关闭

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , | 评论关闭

修改oracle scn小工具(patch scn)

在一些情况下(特别是一些数据库非常规恢复场景中),需要修改oracle scn绕过一些错误,让数据库open成功,在以前的版本中我们可以通过event,隐含参数,oradebug等方法进行修改,在一些较新的版本中这些方法都被oracle屏蔽,无法实现oracle scn进行调整,针对这种情况,开发了一个Patch_SCN小程序,实现对oracle数据库的scn进行调整

SQL> select dbms_flashback.get_system_change_number a from dual;

               A
----------------
    107367806959

通过工具查询scn信息,由于oracle的scn是动态的,因此和get_system_change_number 查询值有细微出入
20220614130358


win版本修改scn
20220614130543

通过查询确认scn修改成功
20220614130553

增加一键通过修改控制文件scn实现改变数据库scn的功能—202210增加
20221006220814

linux版本修改SCN
查询当前数据库SCN

SQL> startup mount
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.

SQL> alter database open;

Database altered.

SQL>  select dbms_flashback.get_system_change_number a from dual;

         A
----------
 248118193

关闭数据库,启动到mount,为修改SCN做准备(为了模拟真实环境,只让程序在mount情况下修改scn,open情况下可以修改但是无实际意义)

SQL> startup mount;
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.
SQL> select spid from v$process where addr = 
  2     (select paddr from v$session where sid=
  3      (select sid from v$mystat where rownum=1));

SPID
------------------------
21019

进行SCN修改

[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN 21019(会话进程号) 300000000(期望修改SCN值)
Machine Code:W0UY-SV09-71CY-IEWA

Please input Key:42FB4ADAB72BB4AD   <----需要联系软件作者惜分飞获取
Confirm modification, please input [Y]...
Y
Modify the Oracle SCN value to:11E1A300:300000000

启动数据库,查询scn

SQL> ALTER DATABASE OPEN;

Database altered.

SQL> select dbms_flashback.get_system_change_number a from dual;

         A
----------
 300000244   ---由于数据库启动之后,scn稍微增加,属于正常情况

通过上述测试,证明Patch_SCN可以完美实现linux平台Oracle 数据库的SCN调整工作

该功能的通用版已经包含到oracle racovery tools工具中(注册版可用)
oracle-recovery-tools


Patch_SCN下载:Patch_SCN下载
Patch_SCN使用说明:Patch_SCN使用说明
OraRecovery下载:OraRecovery下载
OraRecovery使用说明:OraRecovery使用说明

发表在 Oracle, 小工具 | 标签为 , , , , | 评论关闭