标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,768)
- DB2 (22)
- MySQL (77)
- Oracle (1,609)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (166)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (29)
- Oracle备份恢复 (591)
- Oracle安装升级 (97)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (86)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (32)
- SQL Server恢复 (13)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (39)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (22)
-
最近发表
- ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)
- ORA-00756 ORA-10567故障数据0丢失恢复
- 数据库文件变成32k故障恢复
- tcp连接过多导致监听TNS-12532 TNS-12560 TNS-00502错误
- 文件系统格式化MySQL数据库恢复
- .sstop勒索加密数据库恢复
- 解决一次硬件恢复之后数据文件0kb的故障恢复case
- Error in invoking target ‘libasmclntsh19.ohso libasmperl19.ohso client_sharedlib’问题处理
- ORA-01171: datafile N going offline due to error advancing checkpoint
- linux环境oracle数据库被文件系统勒索加密为.babyk扩展名溯源
- ORA-600 ksvworkmsgalloc: bad reaper
- ORA-600 krccfl_chunk故障处理
- Oracle Recovery Tools恢复案例总结—202505
- ORA-600 kddummy_blkchk 数据库循环重启
- 记录一次asm disk加入到vg通过恢复直接open库的案例
- CHECKDB 发现了 N 个分配错误和 M 个一致性错误
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
分类目录归档:ORA-xxxxx
模拟极端ORA-08103并解决
ORA-08103错误在数据库日程运维和异常恢复中都可能遇到,出现该错误的原因很多,有很多情况下(模拟普通ORA-08103并解决),直接通过修改块使其在数据库查询的时候表标志为坏块,然后使用event或者dbms_repair包来标志该块,然后跳过就可以解决该问题,但是有些时候,遇到极端情况,该方法会失效,需要借助极端工具来处理该极端问题.
分析表相关EXTENT
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> create table t_xifenfei as 2 select * from dba_objects; Table created. SQL> select count(*) from t_xifenfei; COUNT(*) ---------- 74504 SQL> select data_object_id,object_id from dba_objects where object_name='T_XIFENFEI'; DATA_OBJECT_ID OBJECT_ID -------------- ---------- 75592 75592 SQL> select file_id,block_id,block_id+blocks-1 2 from dba_extents 3 where segment_name ='T_XIFENFEI' AND owner='CHF'; FILE_ID BLOCK_ID BLOCK_ID+BLOCKS-1 ---------- ---------- ----------------- 4 680 687 4 688 695 4 696 703 4 704 711 <---注意 4 712 719 4 720 727 4 728 735 ………… 24 rows selected. SQL> Select segment_name,header_file,header_blocK 2 from dba_segments where 3 segment_name in ('T_XIFENFEI') and owner='CHF' 4 ; SEGMENT_NAME HEADER_FILE HEADER_BLOCK ------------------------------ ----------- ------------ T_XIFENFEI 4 682 SQL> alter system checkpoint; System altered. SQL> alter system dump datafile 4 block 682; System altered. SQL> select value from v$diag_info where name='Default Trace File'; VALUE -------------------------------------------------------------------------------- /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_28933.trc --trace文件 Dump of memory from 0xB6CFD600 to 0xB6CFF600 B6CFD600 0000A223 010002AA 000DB4EA 04010000 [#...............] B6CFD610 00008ECF 00000000 00000000 00000000 [................] B6CFD620 00000000 00000018 00000480 00000A9C [................] B6CFD630 00000017 0000003F 00000080 010007BF [....?...........] B6CFD640 00000000 00000017 00000000 0000043F [............?...] B6CFD650 00000000 00000000 00000000 00000017 [................] B6CFD660 0000003F 00000080 010007BF 00000000 [?...............] B6CFD670 00000017 00000000 0000043F 01000780 [........?.......] B6CFD680 01000780 00000000 00000000 00000000 [................] B6CFD690 00000000 00000000 00000000 00000000 [................] Repeat 3 times B6CFD6D0 00000001 00002000 00000000 00001434 [..... ......4...] B6CFD6E0 00000000 010002A9 00000001 01000781 [................] B6CFD6F0 010002A9 00000000 00000000 00000000 [................] B6CFD700 00000000 00000000 00000018 00000000 [................] B6CFD710 00012748 10000000 010002A8 00000008 [H'..............] B6CFD720 010002B0 00000008 010002B8 00000008 [................] B6CFD730 010002C0 00000008 010002C8 00000008 [................] <----dump中找到下面值 …… Extent Map ----------------------------------------------------------------- 0x010002a8 length: 8 0x010002b0 length: 8 0x010002b8 length: 8 0x010002c0 length: 8 <-----选择第四个exent 0x010002c8 length: 8 ………… Auxillary Map -------------------------------------------------------- Extent 0 : L1 dba: 0x010002a8 Data dba: 0x010002ab Extent 1 : L1 dba: 0x010002a8 Data dba: 0x010002b0 Extent 2 : L1 dba: 0x010002b8 Data dba: 0x010002b9 Extent 3 : L1 dba: 0x010002b8 Data dba: 0x010002c0 <---同上 Extent 4 : L1 dba: 0x010002c8 Data dba: 0x010002c9 Extent 5 : L1 dba: 0x010002c8 Data dba: 0x010002d0 ………… -------------------------------------------------------- --确定Extent 3的记录在file 4 block 682 的偏移量为304 ------省略了相同部分B6CFD SQL> SELECT TO_NUMBER('730','XXXXX') FROM DUAL; TO_NUMBER('730','XXXXX') ------------------------ 1840 SQL> SELECT TO_NUMBER('600','XXX') FROM DUAL; TO_NUMBER('600','XXX') ---------------------- 1536 SQL> SELECT 1840-1536 FROM DUAL; 1840-1536 ---------- 304 SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down.
这里我们假设修改SEGMENT HEADER中关于EXTENT 3(从0开始计数)的映射地址,从而来使得该表在查询的时候出现ORA-08103错误
bbed修改相关值
[oracle@xifenfei ~]$ bbed parfile=bbed.par BBED> SET MODE EDIT MODE Edit BBED> INFO File# Name Size(blks) ----- ---- ---------- 4 /u01/oracle/oradata/ora11g/users01.dbf 0 BBED> SET FILE 4 BLOCK 682 FILE# 4 BLOCK# 682 BBED> D File: /u01/oracle/oradata/ora11g/users01.dbf (4) Block: 682 Offsets: 0 to 511 Dba:0x010002aa ------------------------------------------------------------------------ 23a20000 aa020001 eab40d00 00000104 cf8e0000 ………… <32 bytes per line> BBED> m /x 1100 File: /u01/oracle/oradata/ora11g/users01.dbf (4) Block: 682 Offsets: 0 to 511 Dba:0x010002aa ------------------------------------------------------------------------ 11000000 aa020001 eab40d00 00000104 cf8e0000 ………… <32 bytes per line> BBED> d File: /u01/oracle/oradata/ora11g/users01.dbf (4) Block: 682 Offsets: 0 to 511 Dba:0x010002aa ------------------------------------------------------------------------ 11000000 aa020001 eab40d00 00000104 cf8e0000 ………… <32 bytes per line> BBED> sum apply Check value for File 4, Block 682: current = 0x2cfd, required = 0x2cfd
重现ORA-08103
SQL> startup ORACLE instance started. Total System Global Area 313860096 bytes Fixed Size 1344652 bytes Variable Size 239078260 bytes Database Buffers 67108864 bytes Redo Buffers 6328320 bytes Database mounted. Database opened. SQL> conn chf/xifenfei Connected. SQL> select count(*) from t_xifenfei; select count(*) from t_xifenfei * ERROR at line 1: ORA-08103: object no longer exists
定位坏块位置
SQL> alter session set max_dump_file_size=unlimited; Session altered. SQL> alter session set db_file_multiblock_read_count=1; Session altered. SQL> alter session set events 'immediate trace name trace_buffer_on level 1048576'; Session altered. SQL> alter session set events '10200 trace name context forever, level 1'; Session altered. SQL> alter session set events '8103 trace name errorstack level 3'; Session altered. SQL> alter session set events '10236 trace name context forever, level 1'; Session altered. SQL> alter session set tracefile_identifier='ORA8103'; Session altered. SQL> select * from chf.t_xifenfei; select * from chf.t_xifenfei * ERROR at line 1: ORA-08103: object no longer exists --trace文件关键内容 block_row_dump: tab 0, row 0, @0x1f70 tl: 48 fb: --H-FL-- lb: 0x0 cc: 3 col 0: [ 9] 44 49 43 54 2e 42 41 53 45 col 1: [ 1] 32 col 2: [32] ………… end_of_block_dump The buffer with tsn: 0 rdba: 0x00400321 has already been dumped The buffer with tsn: 4 rdba: 0x010002aa was pinned, but could not be dumped SQL> Select to_number('010002aa','xxxxxxxxxxxxxxxxxx') from dual; TO_NUMBER('010002AA','XXXXXXXXXXXXXXXXXX') ------------------------------------------ 16777898 SQL> select 2 dbms_utility.data_block_address_file(16777898) FILE_NO, 3 dbms_utility.data_block_address_block(16777898) BLOCK_NO 4 from dual; FILE_NO BLOCK_NO ---------- ---------- 4 682
检查坏块
[oracle@xifenfei ~]$ dbv file=/u01/oracle/oradata/ora11g/users01.dbf DBVERIFY: Release 11.2.0.3.0 - Production on Fri Jan 13 18:03:13 2012 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/ora11g/users01.dbf Block Checking: DBA = 16777898, Block Type = Unlimited data segment header with flg blks Incorrect total map count: 24 Page 682 failed with check code 17006 DBVERIFY - Verification complete Total Pages Examined : 2240 Total Pages Processed (Data) : 1421 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 199 Total Pages Failing (Index): 0 Total Pages Processed (Other): 229 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 391 Total Pages Marked Corrupt : 1 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 898278 (0.898278) --对应块 SQL> select 2 dbms_utility.data_block_address_file(16777898) FILE_NO, 3 dbms_utility.data_block_address_block(16777898) BLOCK_NO 4 from dual; FILE_NO BLOCK_NO ---------- ---------- 4 682
通过这里dbv检查发现,该数据库已经是坏块了,也就是说,网上流传的人工标志为坏块的方法在此处行不通,这里我们考虑使用dul等类此工具挖取数据,来拯救数据
dul和odu常规操作
DUL> unload table chf.t_xifenfei; . unloading table T_XIFENFEI DUL: Warning: Using data objno from segment header( 16779199) instead of expected id (75592) from dictionary DUL: Error: No entry in control file for block: ts# = 4 rfile# = 0 block# = 1087 DUL: Error: While processing unknown file block# 1087 DUL: Error: Could not read/parse data block ODU> unload table chf.t_xifenfei Unloading table: T_XIFENFEI,object ID: 75592 Unloading segment,storage(Obj#=75592 DataObj#=75592 TS#=4 File#=4 Block#=682 Cluster=0) corrupted block 0 rows unloaded
通过上面的测试证明,在该中情况下(SEGMENT HEADER)出现异常时,dul/odu均不能使用最常规的方法挖的数据
扫描数据文件方式挖
因为odu在这个方面的操作人性化于dul,所以只用odu进行相关测试,dul肯定能过实现相同功能
ODU> SCAN EXTENT TABLESPACE 4 PARALLEL 2 scan extent start: 2012-9-23 2:47:51 scanning extent... scanning extent finished. scan extent completed: 2012-9-23 2:47:51 ODU> unload table chf.t_xifenfei object 75592 Unloading table: T_XIFENFEI,object ID: 75592 Unloading segment,storage(Obj#=75592 DataObj#=75592 TS#=4 File#=4 Block#=682 Cluster=0) 74504 rows unloaded
试验证明通过odu扫描表空间/数据文件找回来所有的数据,然后truncate掉问题表,导入该数据库,问题可以得意顺利解决
ORA-30013导致RAC 节点down掉
今天一朋友让我帮忙分析他们的9.2.0.2 rac 节点2异常down掉原因,相关信息如下:
前提信息
OS:HP-UX B.11.31 DB:9.2.0.2.0 RAC
节点2alert日志信息
Fri Sep 7 13:13:49 2012 ARC0: Completed archiving log 11 thread 2 sequence 11651 Fri Sep 7 13:31:56 2012 Errors in file /oracle/admin/agent/udump/agent2_ora_797.trc: ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], [] ORA-00028: your session has been killed Fri Sep 7 13:31:58 2012 Errors in file /oracle/admin/agent/bdump/agent2_pmon_5938.trc: ORA-30013: undo tablespace 'UNDOTBS1' is currently in use Fri Sep 7 13:31:58 2012 PMON: terminating instance due to error 30013 Fri Sep 7 13:31:58 2012 Errors in file /oracle/admin/agent/bdump/agent2_lms7_6033.trc: ORA-30013: undo tablespace '' is currently in use Fri Sep 7 13:31:58 2012 ………… Errors in file /oracle/admin/agent/bdump/agent2_lms0_6027.trc: ORA-30013: undo tablespace '' is currently in use Fri Sep 7 13:31:58 2012 System state dump is made for local instance Fri Sep 7 13:32:03 2012 Instance terminated by PMON, pid = 5938 Fri Sep 7 14:34:35 2012
这里可以看到因为ORA-30013的错误使得pmon进程异常,从而使得该rac的节点2 down掉.同时这里还发现了ORA-00600[kgavsd_3]错误,是否是因为该ORA-600导致了数据库异常down还是一个偶然机会,我们继续分析
查看ORA-600[kgavsd_3]相关trace文件
*********START PLSQL RUNTIME DUMP************ ***Got ORA-28 while running PLSQL*** ***********END PLSQL RUNTIME DUMP************ *** 2012-09-07 13:31:56.740 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], [] ORA-00028: your session has been killed Current SQL statement for this session: --用户补档 DECLARE OUT_ERR_CODE NUMBER; OUT_ERR_MSG VARCHAR2(1000); V_COUNT NUMBER; BEGIN WHILE TRUE LOOP SELECT COUNT(*) INTO V_COUNT FROM amc_stat_log where proc_name in('pRunOdsChannelWareData') and run_param=201208 AND STATE='A'; IF V_COUNT>0 THEN dbms_output.put_line('exit loop '|| sysdate); EXIT; END IF; sys.Dbms_Lock.sleep(600); dbms_output.put_line('wake up '|| sysdate); END LOOP; PKG_AME_ODS_DATA.P_Add_TO_AgentServ(201208,OUT_ERR_CODE,OUT_ERR_MSG); PKG_AME_ODS_DATA.P_Update_Serv_Ware_ID(201208, OUT_ERR_CODE, OUT_ERR_MSG); PKG_AMS_SETTLE.P_COMMISION_51PRE_FLAG(201208,OUT_ERR_CODE, OUT_ERR_MSG); END; ----- PL/SQL Call Stack ----- object line object handle number name c000000e84ec59d0 14 anonymous block ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- Cannot find symbol in . Cannot find symbol in . Cannot find symbol in . ksedmp()+512 call 9fffffffffff3940 000000000 ? C000000000000A17 ? 40000000025B6540 ? ksfdmp()+64 call 9fffffffffff3940 000000003 ? kgerinv()+352 call 9fffffffffff3940 60000000000466B0 ? 000000003 ? C000000000000714 ? 4000000004EBA6A0 ? 00001821B ? 6000000000468EA8 ? kgesinv()+48 call 9fffffffffff3940 60000000000466B0 ? 600000000059BD98 ? 600000000046B070 ? 60000000000179C0 ? 6000000000017950 ? kgesin()+112 call 9fffffffffff3940 60000000000466B0 ? 600000000059BD98 ? 4000000000B44C10 ? 000000001 ? 9FFFFFFFFFFF4310 ? $cold_kgavsd_stackl call 9fffffffffff3940 60000000000466B0 ? et_done()+1184 600000000059BD98 ? 4000000000B44C10 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? 00001FE19 ? pbesd_stacklet_done call 9fffffffffff3940 60000000000466B0 ? ()+96 000000002 ? 000000000 ? 9FFFFFFFBEC6AE68 ? pfrrun()+3328 call 9fffffffffff3940 9FFFFFFFBEC6AE68 ? C000000000001D42 ? 4000000001ABBDA0 ? 9FFFFFFFBEC6B06C ? 9FFFFFFFFFFF64F0 ? 6000000000468EA8 ? 000000000 ? 000000000 ? peicnt()+480 call 0000000000000000 9FFFFFFFBEC6AEE8 ? C000000000000996 ? 40000000019F3680 ? kkxexe()+816 call 0000000000000000 9FFFFFFFFFFF64F0 ? 9FFFFFFFBEC6AE68 ? C00000000000099B ? 4000000001DD90F0 ? 00000FE4F ? 9FFFFFFFFFFF5F00 ? 60000000000467F0 ? 4000000000B603F0 ? opiexe()+11168 call 0000000000000000 000000000 ? C000000000002C60 ? 4000000001BEF980 ? 00000800F ? 9FFFFFFFFFFF6470 ? 9FFFFFFFBEC6AEB2 ? 6000000000040900 ? 9FFFFFFFBEC6B534 ? opiall0()+3184 call _etext_f()+23058430 000000004 ? 0000000C4 ? 09110686928 9FFFFFFFFFFF7B40 ? C000000000002BDF ? 4000000001B26CD0 ? 000000000 ? 00000C893 ? 9FFFFFFFFFFF6690 ? Cannot find symbol in . kpoal8()+2064 call 9fffffffffff7ad0 000000001 ? 9FFFFFFFFFFF8304 ? FFFFFFFFBFFFFFFF ? 9FFFFFFFFFFF83E4 ? FFFFFFFFFFE7FBDF ? 9FFFFFFFFFFF7B88 ? 000000000 ? 6000000000474528 ? opiodr()+3584 call 9fffffffffff81fc 6000000000040950 ? 000000000 ? 000000000 ? C000000000002C60 ? 4000000001C09FE0 ? 00000C50B ? 9FFFFFFFFFFF81F0 ? 9FFFFFFFFFFF81D8 ? ttcpip()+3776 call _etext_f()+23058430 00000005E ? 000000014 ? 09114957288 9FFFFFFFFFFFA5F0 ? 6000000000040918 ? C000000000001ABD ? 4000000001AB3BA0 ? 000000000 ? 00000C59B ? opitsk()+1872 call 9fffffffffffa200 6000000000049FC0 ? 000000001 ? 9FFFFFFFFFFFA5F0 ? 000000001 ? 9FFFFFFFFFFFA740 ? 9FFFFFFFFFFFA564 ? 9FFFFFFFBF780058 ? 000000000 ? opiino()+3184 call 000000000000057b 000000000 ? 000000000 ? C00000000000132B ? 4000000001F78730 ? 000008001 ? opiodr()+3584 call 0000000000000000 6000000000548A38 ? 4000000000B606F0 ? 6000000000548A38 ? C000000000002C60 ? 4000000001C09FE0 ? 00000A201 ? 9FFFFFFFFFFFBC90 ? 4000000000B606F0 ? opidrv()+976 call _etext_f()+23058430 00000003C ? 000000004 ? 09114957288 9FFFFFFFFFFFEFB0 ? 6000000000040918 ? sou2o()+80 call _etext_f()+23058430 000000004 ? 000000004 ? 09114957288 9FFFFFFFFFFFEFB0 ? main()+352 call _etext_f()+23058430 9FFFFFFFFFFFEFD0 ? 09114957288 9FFFFFFFFFFFEFD4 ? 60000000004744F0 ? 9FFFFFFFFFFFEFB0 ? main_opd_entry()+80 call _etext_f()+23058430 000000000 ? 09114957288 9FFFFFFFFFFFF498 ? C000000000000004 ? C00000000002BE30 ? --------------------- Binary Stack Dump --------------------- Process global information: process: c000000d6428c0c0, call: c000000e46e772a8, xact: 0000000000000000, curses: c000000d6437d020, usrses: c000000d6437d020 ---------------------------------------- SO: c000000d6428c0c0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00 (process) Oracle pid=282, calls cur/top: c000000e46e772a8/c000000e46e772a8, flag: (0) - int error: 28, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 0 last post received-location: No post last process to post me: c000000d64234f18 1 6 last post sent: 0 0 104 last post sent-location: kglpsl: in loop last process posted by me: c000000d6428e900 23 0 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: c000000d62234ee0 O/S info: user: oracle, term: UNKNOWN, ospid: 797 OSD pid info: Unix process pid: 797, image: oracle@gzagent2 (TNS V1-V3) ---------------------------------------- SO: c000000d6437d020, type: 4, owner: c000000d6428c0c0, flag: INIT/-/-/0x00 (session) trans: 0000000000000000, creator: c000000d6428c0c0, flag: (41) USR/- BSY/-/-/-/KIL/- DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000 txn branch: 0000000000000000 oct: 0, prv: 0, sql: c000000e76b03f10, psql: 0000000000000000, user: 31/CUSTOM O/S info: user: huangqianhai_lc, term: SVCTAG-D1MLV2X, ospid: 11124:11796, machine: WORKGROUP\SVCTAG-D1MLV2X program: plsqldev.exe application name: PL/SQL Developer, hash value=1190136663 action name: 测试窗口 - 新建, hash value=3604520210 last wait for 'null event' blocking sess=0x0 seq=142 wait_time=567341620 =ea60, =0, =0 temporary object counter: 0 ----------------------------------------
通过这里可以看出来是因为pl/sql dev进行一个plsql的操作导致该错误发生,查询MOS[ID 403575.1]发现
Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.7 and later [Release: 9.2 and later ] Information in this document applies to any platform. ***Checked for relevance on 27-Oct-2010*** Symptoms The following errors appears in the alert log file : Probe:read_pipe: receive failed, status 3 Probe:S:debug_loop: timeout. Action 1 *********START PLSQL RUNTIME DUMP************ ***Got ORA-604 while running PLSQL*** ***********END PLSQL RUNTIME DUMP************ ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], [] ORA-00604: error occurred at recursive SQL level 2 Current SQL statement for this session: begin :id := sys.dbms_transaction.local_transaction_id; end; . Cause The ora-600 kgavsd_3 appears when calling kgavsd_stacklet_done function which is related to PLSQL DEBUG. From traces, dbms_debug package is being used during trace generation. The return code of 3 further indicates that the dbms_pipe message was interrupted. Probably user cancelled a plsql program, so the appeared while trying to dump the stack Solution There is no data corruption over here. The error appears to be due to abnormal termination of aPL/SQL Developer application while executing a PL/SQL block. Changing the PL/SQL and/or the procedure code could help in avoiding this error message. Hence, this error can be safely ignored.
查找trace文件确实发现有name=SYS.DBMS_DEBUG,进一步表明该错误是由于plsql dev工具使用debug模式运行上面的plsql而引起该错误的发生,但是因为mos中记录和错误不是完全的一致,所以不能十分确定是该错误导致数据库down掉
继续分析ORA-30013
Error: ORA-30013 (ORA-30013) Text: undo tablespace '%s' is currently in use --------------------------------------------------------------------------- Cause: the specified undo tablespace is currently used by another instance. Action: Wait for the undo tablespace to become available or change to another name and reissue the statement.
这个说明是没有疑问的:因为2节点配置的当前undo是UNDOTBS2,而UNDOTBS1是1节点使用的,证明这里的undo确实发生了错误,继续查询mos发现Bug 3368552
Hdr: 3368552 9.2.0.3 RDBMS 9.2.0.3 RAC PRODID-5 PORTID-23 Abstract: RAC: ORA-30013 WHEN INSTANCE 2 ATTEMPTS TO ACCESS UNDO TABLESPACE OF INSTANCE 1 *** 01/12/04 06:21 am *** TAR: ---- 3554549.995 PROBLEM: -------- The RAC database has been stable, but experienced an instance termination due to ORA-30031 error in the alert log (instance 2): ... Tue Dec 23 03:01:46 2003 ARC1: Evaluating archive log 4 thread 2 sequence 1116 ARC1: Beginning to archive log 4 thread 2 sequence 1116 Creating archive destination LOG_ARCHIVE_DEST_1: '/oracle/oradata/VLDB/logs/archives/VLDBN2/VLDB_0000001116_0002.arc' ARC1: Completed archiving log 4 thread 2 sequence 1116 Tue Dec 23 08:14:09 2003 Errors in file /oracle/admin/VLDB/bdump/vldbn2_pmon_22860.trc: ORA-30013: undo tablespace 'UNDOTBS1' is currently in use Tue Dec 23 08:14:09 2003 PMON: terminating instance due to error 30013 Tue Dec 23 08:14:10 2003 System state dump is made for local instance Tue Dec 23 08:14:12 2003 Trace dumping is performing id=[cdmp_20031223081410] Tue Dec 23 08:14:14 2003 Instance terminated by PMON, pid = 22860 <eof> Instance 1 alert log shows only the reconfiguration and the cdump info: .. Tue Dec 23 03:54:13 2003 Errors in file /oracle/admin/VLDB/udump/vldbn1_ora_13564.trc: ORA-27037: unable to obtain file status SVR4 Error: 2: No such file or directory Additional information: 4 Tue Dec 23 08:14:10 2003 Trace dumping is performing id=[cdmp_20031223081410] Tue Dec 23 08:14:12 2003 Reconfiguration started List of nodes: 0, Global Resource Directory frozen one node partition Communication channels reestablished ...
因为在9.2.0.3的RAC中有着该bug,那么我们可以大胆猜测在9.2.0.2中应该存在该bug,那么结合上面的ORA-00600[kgavsd_3]错误,我们大概还原该事故的全部:
1.节点1 dml操作了程序中报错的plsql中要范围的部分表对象,但是未提交(或者正在执行)
2.节点2 有用户使用pl/sql dev去执行程序中的plsql,因为是debug模式执行,需要UNDOTBS1的块来构建cr,从而使得节点2去访问UNDOTBS1,引发了Bug 3368552 从而使得数据库直接kill掉该plsql dev会话,进而出现ORA-00600[kgavsd_3]错误和pmon进程异常使得节点2 down掉
dbca创建数据库报ORA-00443
今天早上一个朋友和我说他们RAC dbca创建库不成功提示ORA-00443错误,让我帮他们分析下是什么原因导致
提示错误如图
环境状况
OS:LINUX REDHAT x86_64 5.7 kernel:2.6.18-194.el5 memory:100G CPU:ntel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz * 64 ORACLE:10.2.0.4
查看alert日志错误
Wed Sep 5 01:32:33 2012 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 PMON started with pid=2, OS id=17859 DIAG started with pid=7, OS id=17861 PSP0 started with pid=12, OS id=17863 LMON started with pid=17, OS id=17865 LMD0 started with pid=22, OS id=17867 MMAN started with pid=27, OS id=17869 DBW0 started with pid=32, OS id=17871 Wed Sep 5 01:32:33 2012 Errors in file /u01/app/oracle/admin/dtjcdb/bdump/dtjcdb1_ora_17873.trc: ORA-00600: internal error code, arguments: [ksbmoveme4], [], [], [], [], [], [], [] ORA-27300: OS system dependent operation:run on node failed with status: 2 ORA-27301: OS failure message: No such file or directory ORA-27302: failure occurred at: skgpmoveme:1 Wed Sep 5 01:32:34 2012 Trace dumping is performing id=[cdmp_20120905013234] Wed Sep 5 01:32:34 2012 Process DBW1 died, see its trace file USER: terminating instance due to error 443 Instance terminated by USER, pid = 17857
trace文件中内容
*** 2012-09-05 01:32:33.996 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [ksbmoveme4], [], [], [], [], [], [], [] ORA-27300: OS system dependent operation:run on node failed with status: 2 ORA-27301: OS failure message: No such file or directory ORA-27302: failure occurred at: skgpmoveme:1 Current SQL information unavailable - no session. ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FFFC4ABA4C0 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FFFC4ABA4C0 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ? 7FFFC4ABA4C0 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? kgerinv()+161 call ksfdmp() 000000003 ? 000000001 ? 7FFFC4ABA4C0 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? kgesinv()+33 call kgerinv() 0068966E0 ? 000000000 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? 000000000 ? ksesin()+211 call kgesinv() 0068966E0 ? 000000000 ? 7FFFC4ABA520 ? 7FFFC4ABA460 ? 000000000 ? 000000000 ? ksbmoveme()+350 call ksesin() 00533D5C8 ? 000000000 ? 006896FA3 ? 000000001 ? 000000001 ? 000000000 ? ksosp_set_current() call ksbmoveme() 000000001 ? 000000000 ? +117 006896FA3 ? 000000001 ? 000000001 ? 000000000 ? kso_init()+161 call ksosp_set_current() 151056D10 ? 000000000 ? 006896FA3 ? 000000001 ? 000000001 ? 000000000 ? opirip()+523 call kso_init() 151056D10 ? 000000000 ? 006896FA3 ? 000000001 ? 000000001 ? 000000000 ? opidrv()+582 call opirip() 000000032 ? 000000004 ? 7FFFC4ABC128 ? 000000001 ? 000000001 ? 000000000 ? sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7FFFC4ABC128 ? 000000001 ? 000000001 ? 000000000 ? opimai_real()+317 call sou2o() 7FFFC4ABC100 ? 000000032 ? 000000004 ? 7FFFC4ABC128 ? 000000001 ? 000000000 ? main()+116 call opimai_real() 000000003 ? 7FFFC4ABC190 ? 000000004 ? 7FFFC4ABC128 ? 000000001 ? 000000000 ? __libc_start_main() call main() 000000003 ? 7FFFC4ABC190 ? +244 000000004 ? 7FFFC4ABC128 ? 000000001 ? 000000000 ? _start()+41 call __libc_start_main() 000723088 ? 000000001 ? 7FFFC4ABC2E8 ? 000000000 ? 000000001 ? 000000003 ? --------------------- Binary Stack Dump ---------------------
通过查询MOS发现[ID 422908.1]有类此的错误提示,但是该提示说是因为系统重新增加了过多CPU导致数据库crashed掉并且出现 ORA-27300 ORA-27301 ORA-27302 错误.在该案例中,起始就是64c,根据经验在win的10.2.0.4中如果cpu超过32c也是在dbcd创建数据库2%的地方hang住,所以怀疑该错误也是由于cpu太多导致.
处理方法
To solve the problem: 1) apply patch:6471079 - or - 2) apply the 10.2.0.5 (when available) - or - 3) upgrade to 11g
朋友打上patch:6471079,dbca正常建库