分类目录归档:Oracle

11g DirectPath Reads 噩梦案例

DirectPath Reads 说明
在oracle 11g以前的版本中,如果对大表进行全表扫描,wait event是:db file scattered read;在11g中,如果对大表进行全表扫描,wait event是:direct path read。在11g中,大表全表扫描时数据块不经过sga而直接进pga,这样会造成每次进行大表全表扫描,物理读都是很大,而在10g中,由于全表扫描的数据块在sga中已经存在,所以执行全表扫描时,它的物理读为0。但是这里主要是oracle在优化策略上的进步,即假定大表频繁全表扫描这种现象,在生产库上不会太多,通过把数据直接读入pga,进而减少了cache buffer的繁忙交换程度,提高了cache buffer的使用效率.

DirectPath Reads 优势
1. 减少了对栓的使用,避免可能的栓争用
2. 物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。

DirectPath Reads 噩梦
这一切听起来都很美好,但是在大并发的OLTP系统中,这东西简直是一个噩梦.通过一个awr来说明该问题:这个是一个系统的awr报告,朋友反馈说系统有段运行缓慢,请求帮忙找出原因
分析总体信息

系统这段时间会话临时大幅度增加(从102增加到223),系统出现异常繁忙(60.62*16=969.92<2,454.52)
分析Load Profile信息

通过这个截图发现系统的业务不是很大,但是Physical reads参数异常
1.物理读大小:25071.1*8192/1024/1024=195.86796875M/S
2.物理读将近逻辑读一半,这个在一般系统中很难得到这个比例,进一步说明物理读过高

分析Top 5信息

这里可以发现direct path read等待很多

分析Host CPU

可以发现iowait很大占40.5%,io等待异常高(195M/S能不高吗?)
补充说明:在这里我们看到的%Idle=1-%System-%User不包括%WIO

处理建议
通过上面的评估,可以确定大部分是由于 导致了数据库的物理读过高,从而使得系统反应变慢,处理方法就是关闭掉11g该新特性
alter system set event= ’10949 trace name context forever, level 1′ scope=spfile;
重启数据库

发表在 Oracle性能优化 | 评论关闭

undo segment header坏块异常恢复

alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉

Fri Jul  6 18:00:40 2012
SMON: ignoring slave err,downgrading to serial rollback
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17).
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri Jul  6 18:05:53 2012
SMON: Restarting fast_start parallel rollback
Fri Jul  6 18:05:54 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
…………
Wed Jul  6 18:50:38 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc:
ORA-00474: SMON process terminated with error
Wed Jul  6 18:50:38 2012
PMON: terminating instance due to error 474

从三个地方得出3号回滚段异常
1.trace文件

SMON: about to recover undo segment 3
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317 
*** 2012-07-06 17:55:19.042
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 607 
*** 2012-07-06 17:55:19.761
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 3
XID passed in =xid: 0x0003.011.00003c2b
XID from Undo block =xid: 0x0004.020.00002b35

2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态

尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$

Fri Jul  6 18:16:19 2012
Completed: ALTER DATABASE OPEN
Fri Jul  6 18:16:56 2012
drop rollback segment "_SYSSMU3$"
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Fri Jul  6 18:16:57 2012
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:16:57 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0
  Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log
Block recovery completed at rba 209591.225.16, scn 1.3493910790
ORA-607 signalled during: drop rollback segment "_SYSSMU3$"...
Fri Jul  6 18:16:57 2012
Corrupt Block Found
         TSN = 1, TSNAME = UNDOTBS1
         RFN = 2, BLK = 41, RDBA = 8388649
         OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
PMON: terminating instance due to error 474
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc:
ORA-00474: SMON process terminated with error
Instance terminated by PMON, pid = 17355

这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现

SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
*** 2012-07-06 18:16:57.734
Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block
ERROR: SMU Segment Header Corrupted.  Error Code = 38508
ktu4smck: starting extent(0x77) of txn slot #0x11 is  invalid.
  valid value (0 - 0x76)
  TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000
            mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86
            Version: 0x01

因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.

处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库

发表在 Oracle备份恢复 | 标签为 | 2 条评论

分析一例 TX Enqueue contention案例

应用反馈某个业务比较慢,需要紧急处理
查询等待事件

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL> select A.INST_ID,count(*), event
  2    from Gv$session_wait a
  3  where event not in ('SQL*Net more data to client',
  4  'rdbms ipc message',
  5               'smon timer',
  6               'pmon timer',
  7               'SQL*Net message from client',
  8               'lock manager wait for remote message',
  9               'ges remote message',
 10               'gcs remote message',
 11               'gcs for action',
 12               'client message',
 13               'pipe get',
 14               'null event',
 15               'PX Idle Wait',
 16               'single-task message',
 17               'PX Deq: Execution Msg',
 18               'KXFQ: kxfqdeq - normal deqeue',
 19               'listen endpoint status',
 20               'slave wait',
 21               'wakeup time manager','jobq slave wait')
 22  group by INST_ID,event
 23  order by 1 desc,2 desc;

   INST_ID   COUNT(*) EVENT
---------- ---------- --------------------------------------
         2          8 enqueue
         2          1 async disk IO
         2          1 db file sequential read
         2          1 SQL*Net message to client
         2          1 PX Deq: reap credit
         1          2 global cache cr request
         1          1 async disk IO
         1          1 PX Deq: reap credit
         1          1 PX Deq: Execute Reply

9 rows selected.

发现enqueue等待有些多,怀疑是TX enquenue,查询阻塞者

SQL> set linesize 100
SQL> set pagesize 66
SQL> col c1 for a15
SQL> col c1 heading "Program Name "
SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST
  2  from gv$lock l,gv$session s
  3  where l.type like 'TX' and l.REQUEST =6
  4  and l.inst_id=s.inst_id and l.sid=s.sid
  5  order by id1
  6  / 

   INST_ID        SID Program Name    TY        ID1        ID2      LMODE    REQUEST
---------- ---------- --------------- -- ---------- ---------- ---------- ----------
         2        295 rtStopMain@zwq_ TX    1441805    2391806          0          6
                      bill_2 (TNS V1-
                      V3)

         2        992 rtStopMain@zwq_ TX    1441805    2391806          0          6
                      bill_2 (TNS V1-
                      V3)

         2       1238 rtStopMain@zwq_ TX    6946827    2546365          0          6
                      bill_2 (TNS V1-
                      V3)

         2       1298 rtStopMain@zwq_ TX    6946827    2546365          0          6
                      bill_2 (TNS V1-
                      V3)

         2       1684 rtStopMain@zwq_ TX    6946827    2546365          0          6
                      bill_2 (TNS V1-
                      V3)

         2       1553 rtStopMain@zwq_ TX    6946827    2546365          0          6
                      bill_2 (TNS V1-
                      V3)

         2         75 rtStopMain@zwq_ TX   12451856     199146          0          6
                      bill_2 (TNS V1-
                      V3)

         2       1125 rtStopMain@zwq_ TX   14352404      63837          0          6
                      bill_2 (TNS V1-
                      V3)

查询持有者

SQL> set linesize 100
SQL> set pagesize 66
SQL> col c1 for a15
SQL> col c1 heading "Program Name "
SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST
  2  from gv$lock l,gv$session s
  3  where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in
  4  (select id1,id2 from gv$lock where type like 'TX' and REQUEST =6)
  5  and l.inst_id=s.inst_id and l.sid=s.sid
  6  order by id1
  7  / 

   INST_ID        SID Program Name    TY        ID1        ID2      LMODE    REQUEST
---------- ---------- --------------- -- ---------- ---------- ---------- ----------
         2         75 rtStopMain@zwq_ TX    1441805    2391806          6          0
                      bill_2 (TNS V1-
                      V3)

         2        992 rtStopMain@zwq_ TX    6946827    2546365          6          0
                      bill_2 (TNS V1-
                      V3)

         2        295 rtStopMain@zwq_ TX   12451856     199146          6          0
                      bill_2 (TNS V1-
                      V3)

         2       1553 rtStopMain@zwq_ TX   14352404      63837          6          0
                      bill_2 (TNS V1-
                      V3)

通过持有者和阻塞者可以得出:
1.持有者和阻塞者都是在2号实例上
2.持有者75阻塞了295/992的会话
3.持有者992阻塞了1238/1298/1684/1553的会话
4.持有者295阻塞了75的会话
5.持有者1553阻塞了1125的会话
6.同时分析发现,所有的持有者sid也在阻塞者中,也就是持有者阻塞了某个sid,而自身又被其他sid给阻塞,形成了多级阻塞或者环.如:75阻塞了295,而295有阻塞了75;992阻塞了1553,而1553阻塞了1125

查询阻塞和持有者对象

SQL> set linesize 110
SQL> col c0 for 999
SQL> col c0 heading "INS"
SQL> col c1 for a15
SQL> col c1 heading "Program Name "
SQL> select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no,
  2  ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no
  3  from gv$session
  4  where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')
  5  / 

 INS        SID Program Name     OBJECT_NO   RFILE_NO   BLOCK_NO     ROW_NO
---- ---------- --------------- ---------- ---------- ---------- ----------
   2         75 rtStopMain@zwq_    1323132         13     122601        111
                bill_2 (TNS V1-
                V3)

   2        295 rtStopMain@zwq_    1323132         13     122601        100
                bill_2 (TNS V1-
                V3)

   2        992 rtStopMain@zwq_    1323132         13     122601        101
                bill_2 (TNS V1-
                V3)

   2       1125 rtStopMain@zwq_    1323132         84      38445         70
                bill_2 (TNS V1-
                V3)

   2       1238 rtStopMain@zwq_    1323132         15     255066         41
                bill_2 (TNS V1-
                V3)

   2       1298 rtStopMain@zwq_    1323132         14     118411          8
                bill_2 (TNS V1-
                V3)

   2       1553 rtStopMain@zwq_    1323132         15     255066         19
                bill_2 (TNS V1-
                V3)

   2       1684 rtStopMain@zwq_    1323132         14     118411         21
                bill_2 (TNS V1-
                V3)

8 rows selected.

SQL> set linesize 100
SQL> set pagesize 100
SQL> col owner for a10
SQL> col object_name for a20
SQL> col object_type for a10
SQL> select owner,object_name,object_id,object_type
  2  from dba_objects
  3  where
  4  object_id in (select ROW_WAIT_OBJ# from gv$session
  5  where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318'))
  6  / 
 

OWNER      OBJECT_NAME           OBJECT_ID OBJECT_TYP
---------- -------------------- ---------- ----------
DBACCADM   DCUSTCREDITBALANCE      1323132 TABLE

通过查询的出来,所有操作的聚焦点都是在DBACCADM.DCUSTCREDITBALANCE表上面

查询相关sql语句

SQL> SQL> set linesize 120
SQL> set pagesize 66
SQL> col c0 for 999
SQL> col c0 heading "INS"
SQL> col c1 for a9
SQL> col c1 heading "OS User"
SQL> col c2 for a9
SQL> col c2 heading "Oracle User"
SQL> col c3 for a15
SQL> col c3 heading "Program Name"
SQL> col b1 for a9
SQL> col b1 heading "Unix PID"
SQL> col b2 for 9999 justify left
SQL> col b2 heading "ORA SID"
SQL> col b3 for 999999 justify left
SQL> col b3 heading "SERIAL#"
SQL> col sql_text for a45
SQL> set space 1
SQL> break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2
SQL> select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text
  2    from gv$sql a, gv$session b, gv$process c
  3   where
  4     a.address = b.sql_address
  5     and b.paddr = c.addr
  6     and a.hash_value = b.sql_hash_value
  7     and a.inst_id=b.inst_id and a.inst_id=c.inst_id
  8     and a.inst_id like '&inst_id' and b.sid like '&sid'
  9   order by c.spid,a.hash_value
 10  / 
Enter value for inst_id: 2
Enter value for sid: 75
old   8:    and a.inst_id like '&inst_id' and b.sid like '&sid'
new   8:    and a.inst_id like '2' and b.sid like '75'

 INS ORA SID Unix PID  Program Name    Oracle Us SERIAL# SQL_TEXT
---- ------- --------- --------------- --------- ------- ---------------------------------------------
   2      75 1167392   rtStopMain@zwq_ DBCUSTOPR   42815 update dcustcreditbalance  set limit_owe=:b0,
                       bill_2 (TNS V1-                   unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_
                       V3)                               time=sysdate where id_no=:b4




SQL> /
Enter value for inst_id: 2
Enter value for sid: 992
old   8:    and a.inst_id like '&inst_id' and b.sid like '&sid'
new   8:    and a.inst_id like '2' and b.sid like '992'

 INS ORA SID Unix PID  Program Name    Oracle Us SERIAL# SQL_TEXT
---- ------- --------- --------------- --------- ------- ---------------------------------------------
   2     992 2760870   rtStopMain@zwq_ DBCUSTOPR   56282 update dcustcreditbalance  set limit_owe=:b0,
                       bill_2 (TNS V1-                   unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_
                       V3)                               time=sysdate where id_no=:b4




SQL> /
Enter value for inst_id: 2
Enter value for sid: 295
old   8:    and a.inst_id like '&inst_id' and b.sid like '&sid'
new   8:    and a.inst_id like '2' and b.sid like '295'

 INS ORA SID Unix PID  Program Name    Oracle Us SERIAL# SQL_TEXT
---- ------- --------- --------------- --------- ------- ---------------------------------------------
   2     295 1639008   rtStopMain@zwq_ DBCUSTOPR   35740 update dcustcreditbalance  set limit_owe=:b0,
                       bill_2 (TNS V1-                   unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_
                       V3)                               time=sysdate where id_no=:b4

其他阻塞者和持有者执行sql语句均和该语句相同,省略其他查询.通过这些查询可以确定是因为对dcustcreditbalance表的更新操作导致了这样的现象发生.

处理方案
1.临时处理方案:kill掉持有者
2.永久处理方案:修改这部分程序业务逻辑

发表在 Oracle性能优化 | 标签为 | 评论关闭