作者归档:惜分飞

ORA-600 2032故障处理

有客户数据库,异常断电之后,数据库运行不稳定(经常性的重启),通过分析发现

Wed Jun 29 01:04:39 2022
Completed: alter database open
Wed Jun 29 01:04:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_3284.trc:
ORA-12012: error on auto execute of job 1
ORA-01578: ORACLE data block corrupted (file # 2, block # 552)
ORA-01110: data file 2: 'E:\ORACLE\PRODUCT\10.2.0\ORADATA_2\ORCL\UNDOTBS01.DBF'
…………
Wed Jun 29 01:13:28 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:15:34 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5488.trc:
ORA-00600: internal error code, arguments: [6002], [6], [48], [5], [0], [], [], []
Wed Jun 29 01:20:54 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:55 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]

Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:57 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
………………
Wed Jun 29 01:21:41 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_q001_2124.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_3376.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_reco_2412.trc:
ORA-00474: SMON process terminated with error

Instance terminated by PMON, pid = 7160

对ora-600 2032进行分析

*** 2022-06-29 01:13:26.907
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, 
tim_scn_map) values (0, :1, :2, :3, :4, :5, :6, :7)
check trace file e:\oracle\product\10.2.0\db_2\rdbms\trace\orcl_ora_0.trc for preloading .sym file messages
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+663           CALL???  ksedst+55            003C878B8 000000000 00FF57178
                                                   000000000
ksfdmp+19            CALL???  ksedmp+663           000000003 00ED07680 010453DA8
                                                   003CACC80
kgeriv+184           CALL???  ksfdmp+19            7FF2378C000 7FF5E2F81D8
                                                   7FF5EBB9300 7FF5EBB92F0
kgesiv+102           CALL???  kgeriv+184           000000002 00ED07040 000000002
                                                   00FF59110
ksesic7+125          CALL???  kgesiv+102           7FFFFFFF00800228 100000002
                                                   200000001 00FF59328
kcopcv+1014          CALL???  ksesic7+125          0000007F0 000000000 000800228
                                                   000000000
kcbchg1_main+3115    CALL???  kcopcv+1014          00001E778 7FF00000001
                                                   00001E980 00001E650
kcbchg1+238          CALL???  kcbchg1_main+3115    00ED07040 00ED07040 000040013
                                                   000000000
ktuchg+1331          CALL???  kcbchg1+238          7FF00000000 000000003
                                                   00FF597C8 00FF59780
ktbchg2+341          CALL???  ktuchg+1331          000000000 000000401 000000046
                                                   000BB5D21
kdtchg+916           CALL???  ktbchg2+341          000000001 000000000 000000240
                                                   000000000
kdtwrp+2582          CALL???  kdtchg+916           01077A268 01044404C 010444054
                                                   010441710
kdtInsRow+705        CALL???  kdtwrp+2582          01077A268 7FF5E2F8510
                                                   003CB3358 0009C20C4
insrowFastPath+125   CALL???  kdtInsRow+705        00ED07040 7FF55A86E30
                                                   7FF58E87918 00001F318
insdrvFastPath+478   CALL???  insrowFastPath+125   000000067 000000000 000000000
                                                   000000000
inscovexe+434        CALL???  insdrvFastPath+478   01077A268 000000000 010440070
                                                   00521E0C9
insExecStmtExecIniE  CALL???  inscovexe+434        7FF55A88FD8 7FF55A86E30
ngine+99                                           00FF5BAD8 0009FA3EC
insexe+453           CALL???  insExecStmtExecIniE  8B2D554D9623 000000006
                              ngine+99             000000006 0000000C0
opiexe+4991          CALL???  insexe+453           7FF55A885A0 00FF5BAD8
                                                   000000102 000000000
opiall0+1931         CALL???  opiexe+4991          7FF00000049 000000003
                                                   00FF5C160 000000020
opikpr+660           CALL???  opiall0+1931         000000065 000000022 00FF5C638
                                                   000000000
opiodr+1136          CALL???  opikpr+660           000000065 000000017 01044B798
                                                   07EEF1FCF
rpidrus+230          CALL???  opiodr+1136          000000065 000000017 01044B798
                                                   80005900000000
rpidru+112           CALL???  rpidrus+230          00FF5D3F0 000000003 01078D4A0
                                                   7FF5B6F1A80
rpiswu2+517          CALL???  rpidru+112           7FF5DA52BB8 000000000
                                                   000000000 000000008
kprball+1446         CALL???  rpiswu2+517          7FF5E408820 000000000
                                                   00FF5DAD0 000000002
ktf_scn_time+4951    CALL???  kprball+1446         01044B798 8B2D00000140
                                                   000000000 000000005
ktmmon+4107          CALL???  ktf_scn_time+4951    000000000 000000001 07FFFFFFF
                                                   00521A3C2
ktmSmonMain+26       CALL???  ktmmon+4107          0049CBAC0 000000004
                                                   8B2D554D9623 00001E768
ksbrdp+903           CALL???  ktmSmonMain+26       003CC2A18 0049CBADC 000000008
                                                   000000004
opirip+700           CALL???  ksbrdp+903           726F77740000001E 003C8B000
                                                   00FF5FA30 000000000
opidrv+860           CALL???  opirip+700           000000032 000000004 00FF5FD50
                                                   000000000
sou2o+52             CALL???  opidrv+860           000000032 000000004 00FF5FD50
                                                   000000003
opimai_real+272      CALL???  sou2o+52             000000000 000000000 000000000
                                                   000000000
opimai+96            CALL???  opimai_real+272      000000000 000000000 000000000
                                                   000000000
BackgroundThreadSta  CALL???  opimai+96            00FF5FEA8 000000001 000000000
rt+633                                             000000000
000000007738F56D     CALL???  BackgroundThreadSta  0069E4590 000000000 000000000
                              rt+633               000000000
0000000077703021     CALL???  000000007738F56D     000000000 000000000 000000000
                                                   000000000
 
--------------------- Binary Stack Dump ---------------------

通过该trace和alert日志信息可以确认是由于smon_scn_time的操作需要使用到undo,但是对应的undo block异常,从而使得该操作失败,进而引起数据库smon进程异常从而引起ORA-00474,数据库自动crash.处理问题比较简单:
1. 对异常undo进行处理,创建新undo,删除老undo
2. 对于smon_scn_time异常数据进行处理

发表在 Oracle备份恢复 | 标签为 , , | 留下评论

Oracle Recovery Tools实战批量坏块修复

有客户数据库无法正常启动ORA-600 6711错误

SQL> startup mount pfile='d:/pfile.txt'
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898],
[0], [], [], [], [], [], [], []
Process ID: 22708
Session ID: 978 Serial number: 56675

alert日志报错

2022-06-26T12:34:41.855326+08:00
alter database open
2022-06-26T12:34:41.984974+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
Undo initialization finished serial:0 start:313418906 end:313418906 diff:0 ms (0.0 seconds)
Database Characterset is ZHS16GBK
No Resource Manager plan active
2022-06-26T12:34:43.302315+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38629):
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38629\ora19c_ora_22708_i38629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-06-26T12:34:44.232115+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38630):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38630\ora19c_ora_22708_i38630.trc
2022-06-26T12:34:45.266678+08:00
opiodr aborting process unknown ospid (22708) as a result of ORA-603
2022-06-26T12:34:45.274688+08:00
ORA-603 : opitsk aborting process
License high water mark = 1
USER (ospid: (prelim)): terminating the instance due to ORA error 

通过分析trace文件进行分析,确认是由于histgrm$表异常导致,通过一些特殊处理,绕过该表相关sql,open数据库,并且尝试导出数据

SQL> startup mount pfile='d:/pfile.txt';
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open;

Database altered.

使用expdp导出数据报ORA-01578错
expdp-ora-1578


通过分析是由于system有坏块导致,dbv检查文件
dbv-huikuai

通过Oracle Recovery Tools工具批量坏块修复功能修复
20220626123245
20220626123343

通过工具修复大量主要坏块被修复,还有一些内部逻辑错误(后续工具继续完善),再次尝试逻辑导出数据,无任何报错,数据比较完美恢复
20220626160209

发表在 小工具, 非常规恢复 | 标签为 , , , , | 留下评论

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

发表在 Oracle ASM, Oracle备份恢复 | 标签为 , | 留下评论