标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,769)
- DB2 (22)
- MySQL (77)
- Oracle (1,610)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (166)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (29)
- Oracle备份恢复 (592)
- Oracle安装升级 (97)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (86)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (32)
- SQL Server恢复 (13)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (39)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (22)
-
最近发表
- 2025年的Oracle 8.0.5数据库恢复
- ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)
- ORA-00756 ORA-10567故障数据0丢失恢复
- 数据库文件变成32k故障恢复
- tcp连接过多导致监听TNS-12532 TNS-12560 TNS-00502错误
- 文件系统格式化MySQL数据库恢复
- .sstop勒索加密数据库恢复
- 解决一次硬件恢复之后数据文件0kb的故障恢复case
- Error in invoking target ‘libasmclntsh19.ohso libasmperl19.ohso client_sharedlib’问题处理
- ORA-01171: datafile N going offline due to error advancing checkpoint
- linux环境oracle数据库被文件系统勒索加密为.babyk扩展名溯源
- ORA-600 ksvworkmsgalloc: bad reaper
- ORA-600 krccfl_chunk故障处理
- Oracle Recovery Tools恢复案例总结—202505
- ORA-600 kddummy_blkchk 数据库循环重启
- 记录一次asm disk加入到vg通过恢复直接open库的案例
- CHECKDB 发现了 N 个分配错误和 M 个一致性错误
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
分类目录归档:Oracle
11g DirectPath Reads 噩梦案例
DirectPath Reads 说明
在oracle 11g以前的版本中,如果对大表进行全表扫描,wait event是:db file scattered read;在11g中,如果对大表进行全表扫描,wait event是:direct path read。在11g中,大表全表扫描时数据块不经过sga而直接进pga,这样会造成每次进行大表全表扫描,物理读都是很大,而在10g中,由于全表扫描的数据块在sga中已经存在,所以执行全表扫描时,它的物理读为0。但是这里主要是oracle在优化策略上的进步,即假定大表频繁全表扫描这种现象,在生产库上不会太多,通过把数据直接读入pga,进而减少了cache buffer的繁忙交换程度,提高了cache buffer的使用效率.
DirectPath Reads 优势
1. 减少了对栓的使用,避免可能的栓争用
2. 物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。
DirectPath Reads 噩梦
这一切听起来都很美好,但是在大并发的OLTP系统中,这东西简直是一个噩梦.通过一个awr来说明该问题:这个是一个系统的awr报告,朋友反馈说系统有段运行缓慢,请求帮忙找出原因
分析总体信息
系统这段时间会话临时大幅度增加(从102增加到223),系统出现异常繁忙(60.62*16=969.92<2,454.52)
分析Load Profile信息
通过这个截图发现系统的业务不是很大,但是Physical reads参数异常
1.物理读大小:25071.1*8192/1024/1024=195.86796875M/S
2.物理读将近逻辑读一半,这个在一般系统中很难得到这个比例,进一步说明物理读过高
分析Top 5信息
这里可以发现direct path read等待很多
分析Host CPU
可以发现iowait很大占40.5%,io等待异常高(195M/S能不高吗?)
补充说明:在这里我们看到的%Idle=1-%System-%User不包括%WIO
处理建议
通过上面的评估,可以确定大部分是由于 导致了数据库的物理读过高,从而使得系统反应变慢,处理方法就是关闭掉11g该新特性
alter system set event= ’10949 trace name context forever, level 1′ scope=spfile;
重启数据库
发表在 Oracle性能优化
评论关闭
undo segment header坏块异常恢复
alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉
Fri Jul 6 18:00:40 2012 SMON: ignoring slave err,downgrading to serial rollback Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17). Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] Fri Jul 6 18:05:53 2012 SMON: Restarting fast_start parallel rollback Fri Jul 6 18:05:54 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc: ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], [] ………… Wed Jul 6 18:50:38 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc: ORA-00474: SMON process terminated with error Wed Jul 6 18:50:38 2012 PMON: terminating instance due to error 474
从三个地方得出3号回滚段异常
1.trace文件
SMON: about to recover undo segment 3 Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 30317 *** 2012-07-06 17:55:19.042 SMON: Restarting fast_start parallel rollback SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 607 *** 2012-07-06 17:55:19.761 SMON: ignoring slave err,downgrading to serial rollback SMON: about to recover undo segment 3 XID passed in =xid: 0x0003.011.00003c2b XID from Undo block =xid: 0x0004.020.00002b35
2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态
尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$
Fri Jul 6 18:16:19 2012 Completed: ALTER DATABASE OPEN Fri Jul 6 18:16:56 2012 drop rollback segment "_SYSSMU3$" Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Fri Jul 6 18:16:57 2012 Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:16:57 2012 Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0 Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log Block recovery completed at rba 209591.225.16, scn 1.3493910790 ORA-607 signalled during: drop rollback segment "_SYSSMU3$"... Fri Jul 6 18:16:57 2012 Corrupt Block Found TSN = 1, TSNAME = UNDOTBS1 RFN = 2, BLK = 41, RDBA = 8388649 OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT = SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 PMON: terminating instance due to error 474 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc: ORA-00474: SMON process terminated with error Instance terminated by PMON, pid = 17355
这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现
SMON: about to recover undo segment 3 SMON: mark undo segment 3 as needs recovery *** 2012-07-06 18:16:57.734 Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block ERROR: SMU Segment Header Corrupted. Error Code = 38508 ktu4smck: starting extent(0x77) of txn slot #0x11 is invalid. valid value (0 - 0x76) TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000 mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe) uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86 Version: 0x01
因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.
处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库
分析一例 TX Enqueue contention案例
应用反馈某个业务比较慢,需要紧急处理
查询等待事件
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production SQL> select A.INST_ID,count(*), event 2 from Gv$session_wait a 3 where event not in ('SQL*Net more data to client', 4 'rdbms ipc message', 5 'smon timer', 6 'pmon timer', 7 'SQL*Net message from client', 8 'lock manager wait for remote message', 9 'ges remote message', 10 'gcs remote message', 11 'gcs for action', 12 'client message', 13 'pipe get', 14 'null event', 15 'PX Idle Wait', 16 'single-task message', 17 'PX Deq: Execution Msg', 18 'KXFQ: kxfqdeq - normal deqeue', 19 'listen endpoint status', 20 'slave wait', 21 'wakeup time manager','jobq slave wait') 22 group by INST_ID,event 23 order by 1 desc,2 desc; INST_ID COUNT(*) EVENT ---------- ---------- -------------------------------------- 2 8 enqueue 2 1 async disk IO 2 1 db file sequential read 2 1 SQL*Net message to client 2 1 PX Deq: reap credit 1 2 global cache cr request 1 1 async disk IO 1 1 PX Deq: reap credit 1 1 PX Deq: Execute Reply 9 rows selected.
发现enqueue等待有些多,怀疑是TX enquenue,查询阻塞者
SQL> set linesize 100 SQL> set pagesize 66 SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 2 from gv$lock l,gv$session s 3 where l.type like 'TX' and l.REQUEST =6 4 and l.inst_id=s.inst_id and l.sid=s.sid 5 order by id1 6 / INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ---------- ---------- --------------- -- ---------- ---------- ---------- ---------- 2 295 rtStopMain@zwq_ TX 1441805 2391806 0 6 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ TX 1441805 2391806 0 6 bill_2 (TNS V1- V3) 2 1238 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1298 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1684 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 75 rtStopMain@zwq_ TX 12451856 199146 0 6 bill_2 (TNS V1- V3) 2 1125 rtStopMain@zwq_ TX 14352404 63837 0 6 bill_2 (TNS V1- V3)
查询持有者
SQL> set linesize 100 SQL> set pagesize 66 SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 2 from gv$lock l,gv$session s 3 where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in 4 (select id1,id2 from gv$lock where type like 'TX' and REQUEST =6) 5 and l.inst_id=s.inst_id and l.sid=s.sid 6 order by id1 7 / INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ---------- ---------- --------------- -- ---------- ---------- ---------- ---------- 2 75 rtStopMain@zwq_ TX 1441805 2391806 6 0 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ TX 6946827 2546365 6 0 bill_2 (TNS V1- V3) 2 295 rtStopMain@zwq_ TX 12451856 199146 6 0 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ TX 14352404 63837 6 0 bill_2 (TNS V1- V3)
通过持有者和阻塞者可以得出:
1.持有者和阻塞者都是在2号实例上
2.持有者75阻塞了295/992的会话
3.持有者992阻塞了1238/1298/1684/1553的会话
4.持有者295阻塞了75的会话
5.持有者1553阻塞了1125的会话
6.同时分析发现,所有的持有者sid也在阻塞者中,也就是持有者阻塞了某个sid,而自身又被其他sid给阻塞,形成了多级阻塞或者环.如:75阻塞了295,而295有阻塞了75;992阻塞了1553,而1553阻塞了1125
查询阻塞和持有者对象
SQL> set linesize 110 SQL> col c0 for 999 SQL> col c0 heading "INS" SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no, 2 ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no 3 from gv$session 4 where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318') 5 / INS SID Program Name OBJECT_NO RFILE_NO BLOCK_NO ROW_NO ---- ---------- --------------- ---------- ---------- ---------- ---------- 2 75 rtStopMain@zwq_ 1323132 13 122601 111 bill_2 (TNS V1- V3) 2 295 rtStopMain@zwq_ 1323132 13 122601 100 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ 1323132 13 122601 101 bill_2 (TNS V1- V3) 2 1125 rtStopMain@zwq_ 1323132 84 38445 70 bill_2 (TNS V1- V3) 2 1238 rtStopMain@zwq_ 1323132 15 255066 41 bill_2 (TNS V1- V3) 2 1298 rtStopMain@zwq_ 1323132 14 118411 8 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ 1323132 15 255066 19 bill_2 (TNS V1- V3) 2 1684 rtStopMain@zwq_ 1323132 14 118411 21 bill_2 (TNS V1- V3) 8 rows selected. SQL> set linesize 100 SQL> set pagesize 100 SQL> col owner for a10 SQL> col object_name for a20 SQL> col object_type for a10 SQL> select owner,object_name,object_id,object_type 2 from dba_objects 3 where 4 object_id in (select ROW_WAIT_OBJ# from gv$session 5 where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')) 6 / OWNER OBJECT_NAME OBJECT_ID OBJECT_TYP ---------- -------------------- ---------- ---------- DBACCADM DCUSTCREDITBALANCE 1323132 TABLE
通过查询的出来,所有操作的聚焦点都是在DBACCADM.DCUSTCREDITBALANCE表上面
查询相关sql语句
SQL> SQL> set linesize 120 SQL> set pagesize 66 SQL> col c0 for 999 SQL> col c0 heading "INS" SQL> col c1 for a9 SQL> col c1 heading "OS User" SQL> col c2 for a9 SQL> col c2 heading "Oracle User" SQL> col c3 for a15 SQL> col c3 heading "Program Name" SQL> col b1 for a9 SQL> col b1 heading "Unix PID" SQL> col b2 for 9999 justify left SQL> col b2 heading "ORA SID" SQL> col b3 for 999999 justify left SQL> col b3 heading "SERIAL#" SQL> col sql_text for a45 SQL> set space 1 SQL> break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2 SQL> select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text 2 from gv$sql a, gv$session b, gv$process c 3 where 4 a.address = b.sql_address 5 and b.paddr = c.addr 6 and a.hash_value = b.sql_hash_value 7 and a.inst_id=b.inst_id and a.inst_id=c.inst_id 8 and a.inst_id like '&inst_id' and b.sid like '&sid' 9 order by c.spid,a.hash_value 10 / Enter value for inst_id: 2 Enter value for sid: 75 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '75' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 75 1167392 rtStopMain@zwq_ DBCUSTOPR 42815 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4 SQL> / Enter value for inst_id: 2 Enter value for sid: 992 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '992' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 992 2760870 rtStopMain@zwq_ DBCUSTOPR 56282 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4 SQL> / Enter value for inst_id: 2 Enter value for sid: 295 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '295' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 295 1639008 rtStopMain@zwq_ DBCUSTOPR 35740 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4
其他阻塞者和持有者执行sql语句均和该语句相同,省略其他查询.通过这些查询可以确定是因为对dcustcreditbalance表的更新操作导致了这样的现象发生.
处理方案
1.临时处理方案:kill掉持有者
2.永久处理方案:修改这部分程序业务逻辑