ORA-600 kddummy_blkchk导致数据库无法open

数据库启动报ORA-600 kddummy_blkchk

Sun Sep  9 21:45:28 2018
SMON: enabling tx recovery
Sun Sep  9 21:45:28 2018
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 24
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=35, OS id=53346452
Sun Sep  9 21:45:28 2018
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_smon_39714888.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [609], [6], [18018], [], [], [], []
Sun Sep  9 21:45:28 2018
Completed: ALTER DATABASE OPEN
Sun Sep  9 21:45:28 2018
db_recovery_file_dest_size of 89900 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Sun Sep  9 21:45:29 2018
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_smon_39714888.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [609], [6], [18018], [], [], [], []
Sun Sep  9 21:45:32 2018
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_smon_39714888.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [609], [6], [18018], [], [], [], []
Sun Sep  9 21:46:20 2018
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_pmon_11403488.trc:
ORA-00474: SMON process terminated with error
Sun Sep  9 21:46:20 2018
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 11403488

kddummy_blkchk错误mos记录

  Format: ORA-600 [kddummy_blkchk] [a] [b] 1


VERSIONS:
  versions 9.2 to 10.2

DESCRIPTION:

  --what condition caused the error to be reported

ARGUMENTS:
  Arg [a] Absolute file number
  Arg [b] Bock number
  Arg 1 Internal error code returned from kcbchk() which indicates the problem encountered.

trace文件信息

Block after image:
buffer tsn: 14 rdba: 0x98400006 (609/6)
scn: 0x0007.9e675f6d seq: 0x01 flg: 0x04 tail: 0x5f6d1e01
frmt: 0x02 chkval: 0x61ae type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of corrupt header 3 = CHKVAL
Dump of memory from 0x070000043F034000 to 0x070000043F034014

*** 2018-09-09 21:45:28.531
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kddummy_blkchk], [609], [6], [18018], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              90000000032FBDC ? 000000000 ?
ksedmp+0290          bl       ksedst               104A2CDD0 ?
ksfdmp+0018          bl       03F26B04             
kgerinv+00dc         bl       _ptrgl               
kseinpre+0040        bl       kgerinv              110123BE0 ? 000000001 ?
                                                   104BD0420 ? 07FFFFFFF ?
                                                   000000000 ?
ksesin+0048          bl       kseinpre             104BD0420 ? 07FFFFFFF ?
                                                   000000000 ?
kco_blkchk+0798      bl       ksesin               104BD0888 ? 300000003 ?
                                                   000000000 ? 000000261 ?
                                                   000000000 ? 000000006 ?
                                                   000000000 ? 000004662 ?
kcoapl+0d24          bl       kco_blkchk           000000000 ? 70000046D293500 ?
                                                   FFFFFFFFFFF9BA0 ? 1101954D0 ?
                                                   FFFFFFFFFFF9AC0 ?
kcbapl+0174          bl       kcoapl               FFFFFFFFFFFAFD0 ?
                                                   70000043F034000 ? 1104C2884 ?
                                                   EDEADBEEF ? 200000000000 ?
                                                   110000790 ? 000000000 ?
kcrfw_redo_gen+2988  bl       kcbapl               FFFFFFFFFFFA040 ?
                                                   70000046D293500 ? 100FAF764 ?
                                                   700000010008000 ? 000000000 ?
kcbchg1_main+25d4    bl       kcrfw_redo_gen       000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kcbchg1+038c         bl       kcbchg1_main         000000000 ? 000000000 ?
                                                   000000018 ? 000000000 ?
                                                   000000000 ? 000000228 ?
ktbchgro+0380        bl       kcbchg1              062886020 ? 1601C0F50 ?
                                                   FFFFFFFFFFFAE80 ?
                                                   FFFFFFFFFFFAEB8 ? 000000000 ?
                                                   000000000 ?
ktfbbset+01e0        bl       ktbchgro             000000000 ? 100000001 ?
                                                   FFFFFFFFFFFB110 ?
                                                   FFFFFFFFFFFAFD0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ?
ktfbfset+039c        bl       ktfbbset             000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   700000010018078 ? 000000000 ?
                                                   110000790 ?
ktfbfundo+0208       bl       ktfbfset             00000000B ? 00000000B ?
ktfbhget+0760        bl       ktfbfundo            FFFFFFFFFFFBFF8 ? 200000000 ?
ktfbffpre+00d0       bl       ktfbhget             FFFFFFFFFFFBFF8 ? 200000002 ?
                                                   1100000000 ? 2033FBE8014 ?
kteopdelete+12dc     bl       ktfbffpre            FFFFFFFFFFFC678 ? 204BCDD80 ?
                                                   1CE04BCEEE8 ? 0FFFFEFFF ?
ktsxfastdele+0124    bl       kteopdelete          700000010018078 ?
                                                   700000010008000 ? 1104772A8 ?
                                                   700000464E35278 ?
                                                   700000464E35278 ? 000000000 ?
                                                   FFFFFFFFFFFBDA0 ?
kteopshrink+0308     bl       _ptrgl               
ktssdrbm_segment+0a  bl       kteopshrink          E0000000E ? FFFFFFFFFFFC608 ?
b0                                                 000000001 ? 000000001 ?
                                                   110477768 ? 000000002 ?
                                                   1100EA4E0 ?
ktssdro_segment+06d  bl       ktssdrbm_segment     FFFFFFFFFFFCFC8 ?
c                                                  FFFFFFFFFFFD090 ? 100000010 ?
                                                   000000000 ?
ktssdt_segs+03f4     bl       ktssdro_segment      70000046D28EA18 ? 6FFFFD3A0 ?
                                                   0FFFFD320 ?
ktmmon+135c          bl       ktssdt_segs          000000000 ?
                                                   7FFFFFFF7FFFFFFF ?
                                                   7FFFFFFF7FFFFFFF ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
                                                   7FFFFFFC7FFFFFFC ?
                                                   05B9523F8 ?
ktmSmonMain+0030     bl       ktmmon               110000790 ?
ksbrdp+04b4          bl       _ptrgl               
opirip+03fc          bl       03F26C94             
opidrv+0458          bl       opirip               11029F970 ? 4102A12B0 ?
                                                   FFFFFFFFFFFF6B0 ?
sou2o+0090           bl       opidrv               3202D99A1C ? 4A004A628 ?
                                                   FFFFFFFFFFFF6B0 ?
opimai_real+0150     bl       01F93134             
main+0098            bl       opimai_real          4500000000000000 ?
                                                   800200140000400 ?
__start+0070         bl       main                 000000000 ? 000000000 ?
 
--------------------- Binary Stack Dump ---------------------

错误比较明显由于file 609 block 6异常,导致smon无法进行回滚,从使得数据库open之后,立马crash.这个问题可以通过使用bbed修复坏块解决,也可以通过设置合适的参数和事件,禁止smon的一些操作来规避
bbed修复block之后dbv检查

oracle@p740a:/tmp]$ dbv file=/u01/app/oracle/oradata/orcl/xifenfei38.dbf

DBVERIFY: Release 10.2.0.4.0 - Production on Sun Sep 16 11:01:54 2018

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/xifenfei38.dbf



DBVERIFY - Verification complete

Total Pages Examined         : 1856000
Total Pages Processed (Data) : 1142647
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 710956
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 2277
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 120
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 2658994127 (7.2658994127)

ORA 600 kddummy_blkchk相关bug列表
bug


发表在 ORA-xxxxx | 标签为 , | 留下评论

私网直连后遗症:一节点无法启动导致另外节点haip无法启动

该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动

# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
     1        ONLINE  ONLINE       xifenfei1                  Started                     
ora.cluster_interconnect.haip                                                      >>>>  OFFLINE
     1        ONLINE  OFFLINE
ora.crf
     1        ONLINE  ONLINE       xifenfei1
ora.crsd
     1        ONLINE  OFFLINE                                                      >>>>  OFFLINE
ora.cssd
     1        ONLINE  ONLINE       xifenfei1
ora.cssdmonitor
     1        ONLINE  ONLINE       xifenfei1
ora.ctssd
     1        ONLINE  ONLINE       xifenfei1                  OBSERVER
ora.diskmon
     1        OFFLINE OFFLINE
ora.drivers.acfs
     1        ONLINE  ONLINE       xifenfei1
ora.evmd
     1        ONLINE  INTERMEDIATE xifenfei1
ora.gipcd
     1        ONLINE  ONLINE       xifenfei1
ora.gpnpd
     1        ONLINE  ONLINE       xifenfei1
ora.mdnsd
     1        ONLINE  ONLINE       xifenfei1

alerthostname日志

2018-09-02 10:38:56.767: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:39:00.771: 
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
2018-09-02 10:40:00.802: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:40:04.806: 
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.

orarootagent_root日志

2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16
2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage.
2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0.
[   CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000]
2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces
2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces

gipcd.log日志

2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces
2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client
2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)]
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist
2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query
2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery
2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0

ohasd.log日志

2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500
2018-09-02 10:38:57.255: [   CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500
2018-09-02 10:38:57.256: [   CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: 
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log".
2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动

[root@xifenfei1 rules.d]# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no   ====>网卡链路状态异常


[root@xifenfei1 rules.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:36  
          inet addr:10.10.17.42  Bcast:172.17.17.255  Mask:255.255.255.0
          inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1     --------->注意
          RX packets:234424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16926236 (16.1 MiB)  TX bytes:24269882 (23.1 MiB)
          Memory:91160000-91180000 

eth1      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:37  
          inet addr:11.1.1.2  Bcast:11.1.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1      --------->注意少了RUNNING
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:91140000-91160000 

关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)

发表在 Oracle RAC | 标签为 | 留下评论

由于bootstrap$异常导致数据库启动报ORA-03113 ORA-07445 lmebucp

数据库无法正常启动,报ORA-03113

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 5016387584 bytes
Fixed Size                  2011136 bytes
Variable Size             905969664 bytes
Database Buffers         4093640704 bytes
Redo Buffers               14766080 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

alert日志报错ORA-07445 lmebucp

Mon Aug 27 15:31:37 2018
Thread 1 advanced to log sequence 21691
Thread 1 opened at log sequence 21691
  Current log# 2 seq# 21691 mem# 0: /data/oracle/orcl/redo02.log
Successful open of redo thread 1
Mon Aug 27 15:31:37 2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 27 15:31:37 2018
SMON: enabling cache recovery
Mon Aug 27 15:31:37 2018
Errors in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_5827.trc:
ORA-07445: exception encountered: core dump [lmebucp()+24] [SIGSEGV] 
[Address not mapped to object] [0x000000000] [] []

跟踪启动10046 trace

WAIT #1: nam='instance state change' ela= 822 layer=2 value=1 waited=1 obj#=-1 tim=1499370211971345
WAIT #1: nam='db file sequential read' ela= 29 file#=1 block#=257 blocks=1 obj#=-1 tim=1499370211971896
=====================
PARSING IN CURSOR #2 len=188 dep=1 uid=0 oct=1 lid=0 tim=1499370211972625 hv=2809067040 ad='b5fe2d00'
create table bootstrap$ ( line#         number not null,   obj#           
number not null,   sql_text   varchar2(4000) not null)   
storage (initial 50K objno 41 extents (file 1 block 257))
END OF STMT
PARSE #2:c=0,e=598,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211972621
BINDS #2:
EXEC #2:c=1000,e=195,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1499370211972873
=====================
PARSING IN CURSOR #2 len=55 dep=1 uid=0 oct=3 lid=0 tim=1499370211973429 hv=2111436465 ad='b7bd0530'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #2:c=0,e=472,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211973426
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2b8c5d50a4d0  bln=22  avl=02  flg=05
  value=41
EXEC #2:c=1000,e=838,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211974375
WAIT #2: nam='db file sequential read' ela= 27 file#=1 block#=257 blocks=1 obj#=-1 tim=1499370211974522
WAIT #2: nam='db file sequential read' ela= 21 file#=1 block#=258 blocks=1 obj#=-1 tim=1499370211974855
FETCH #2:c=1000,e=479,p=2,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1499370211974908
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object),
 addr: 0x0, PC: [0x348772c, lmebucp()+24]
*** 2018-08-27 15:31:37.074
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [lmebucp()+24] [SIGSEGV] 
[Address not mapped to object] [0x000000000] [] []
Current SQL statement for this session:
alter database open
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
Cannot find symbol
Cannot find symbol
Cannot find symbol
ksedst()+31          call     ksedst1()            000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksedmp()+610         call     ksedst()             000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ssexhd()+630         call     ksedmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
<0x336800eca0>       call     ssexhd()             00000000B ? 2B8C5D238D70 ?
                                                   2B8C5D238C40 ? 000000000 ?
                                                   000000000 ? 000000001 ?
 
--------------------- Binary Stack Dump ---------------------

通过这里发现,数据库启动执行select line#, sql_text from bootstrap$ where obj# != :1然后报ORA-07445 lmebucp错误。这样的错误比较诡异,一般可能是由于bootstrap异常导致,但是这里再往上跟踪发现 create bootstrap$表指定的记录为file 1 block 257,根据经验知道数据库的bootstrap$表记录一般是377 或者520比较常见.通过工具对于file 1进行分析

DUL> dump datafile 1 block 257
Block Header:
block type=0x10 (data segment header block (unlimited extents))
block format=0xa2 (oracle 10)
block rdba=0x00400101 (file#=1, block#=257)
scn=0x0000.0000007e, seq=1, tail=0x007e1001
block checksum value=0xe75c=59228, flag=4
Data Segment Header:
  Extent Control Header
  -------------------------------------------------------------
  Extent Header:: extents: 1  blocks: 7
                  last map: 0x00000000  #maps: 0  offset: 4128
      Highwater:: 0x00400103  (rfile#=1,block#=259)
                  ext#: 0  blk#: 1   ext size:7
      #blocks in seg. hdr's freelists: 0
      #blocks below: 1
      mapblk: 0x00000000   offset: 0
      Map Header:: next: 0x00000000   #extents: 1  obj#: 41  flag: 0x40000000
  Extent Control Header
  -------------------------------------------------------------
   0x00400102  length: 7

  nfl = 1, nfb = 1, typ = 2, nxf = 0, ccnt = 0
  SEG LST:: flg:UNUSED lhd: 0x00000000 ltl: 0x00000000

发现异常比较明显,block 257为data_object_id=41,也就是
41|41|CREATE UNIQUE INDEX I_FILE1 ON FILE$(FILE#) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE ( INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 41 EXTENTS (FILE 1 BLOCK 257))
这里看数据库的引导异常或者bootstrap$表中记录异常.通过修复bootstrap相关内容,数据库完美启动

发表在 非常规恢复 | 标签为 , , | 留下评论

又一例asm格式化文件系统恢复

又一个客户把win rac中的asm disk给格式化为ntfs了(data磁盘组由三个500G的磁盘组成,被格式化掉前面两个还剩下一个),而且格式化之后,还进行了一系列恢复(比如修复磁盘头,又进行分区等一些磁盘操作),导致恢复难度增加,也增加了一些数据覆盖
asm alert日志报错

Thu Aug 23 11:20:14 2018
NOTE: ASM client orcl1:orcl disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Process state recorded in trace file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_ora_2260.trc
Thu Aug 23 11:20:28 2018
Errors in file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_lgwr_3820.trc:
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:26 disk_offset(bytes):27566080 io_size:4096 operation:Write type:synchronous
	 result:I/O error process_id:3820
NOTE: unable to write any mirror side for diskgroup DATA
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process 3268:3820 initiating offline of disk 1.4042301899 (DATA_0001) with mask 0x7e in group 2
WARNING: Disk DATA_0001 in mode 0x7f is now being taken offline
NOTE: initiating PST update: grp = 2, dsk = 1/0xf0f0a1cb, mode = 0x15
kfdp_updateDsk(): 22 
Thu Aug 23 11:20:28 2018
kfdp_updateDskBg(): 22 
ERROR: too many offline disks in PST (grp 2)
WARNING: Disk DATA_0001 in mode 0x7f offline aborted

数据库alert日志报错

WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:11992
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in 
group [2.1859146063] from disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260 
in group 2 on disk 1 allocation unit 422 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Thu Aug 23 11:20:13 2018
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:841 disk_offset(bytes):882532352 io_size:131072 operation:Write type:asynchronous
	 result:I/O error process_id:3224
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 240 logical extent 0 of file 259 in group 2 on disk 1 
allocation unit 841 KCF: read, write or open error, block=0x7853 online=1
        file=4 '+DATA/orcl/datafile/users.259.944422883'
        error=15081 txt: ''
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:3224
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group [2.1859146063] from 
disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group 2 on disk 1 
allocation unit 422 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00204: 读取控制文件时出错 (块 41, # 块 1)
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
DBW1 (ospid: 3224): terminating the instance due to error 204

由于客户进行了一系列恢复恢复操作导致查看磁盘都不全

D:\>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
---这里还有一个磁盘没有正常显示
ORCLDISKDATA10                   \Device\Harddisk4\Partition1           511997M--客户尝试修复的磁盘
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

通过主机层面激活卷,删除分区等一系列操作,然后通过kfed构造磁盘头,让这些磁盘在os层面可以正常显示

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
------需要处理的磁盘------
ORCLDISKDATA0                    \Device\Harddisk3\Partition1           511997M
ORCLDISKDATA1                    \Device\Harddisk4\Partition1           511997M
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
-----------------------
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

由于asm磁盘组内部目录au被彻底损坏,导致无法通过asm直接拷贝出来数据,通过底层扫描,按照au恢复出来相关数据,由于格式化ntfs和后续的误操作导致部分数据au被覆盖.其余数据均恢复,抢救了绝大部分数据.
数据文件恢复参考:asm disk header 彻底损坏恢复
另外有一次win平台类似恢复经历:asm disk格式化为ntfs恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

发表在 Oracle ASM, 非常规恢复 | 标签为 , , , , | 留下评论

asm disk 大小限制

这个问题在12C之前争议很小,基本共识非XD环境不能超过2T,但是到了后面的版本中,发生了一些改变,主要是COMPATIBLE.ASM and COMPATIBLE.RDBMS disk group attributes are set to 12.1 or greater的时候asm disk 大小限制依赖au size,
1M ausize asm disk limit为4 PB
2M ausize asm disk limit为8 PB
4M ausize asm disk limit为16 PB
8M ausize asm disk limit为32 PB

asm-limit-1
asm-limit-2


参见:Oracle ASM Storage Limits
18C中COMPATIBLE.ASM和COMPATIBLE.RDBMS默认值(COMPATIBLE.RDBMS为10.1,也就是说默认情况下非XD情况还是只能支持不超过2T的asm disk)
18c-asm

发表在 Oracle ASM | 标签为 , | 留下评论