惜分飞

library cache pin等待事件

发表于 2012 年 5 月 16 日由惜分飞

library cache pin说明
library cache pin 事件是用来管理library cache的并发访问的, pin一个object会引起相应的heap被载入内存中,如果客户端需要修改或检测这个object它就必须在锁住后取得一个pin.library cache pin的等待时间为3秒钟,其中有1秒钟用于PMON后台进程,即在取得pin之前最多等待3秒钟,否则就超时.library cache pin通常是发生在编译或重新编译PL/SQL,VIEW,TYPES等object时.编译通常都是显性的,如安装应用程序,升级,安装补丁程序等,但object的重新编译也可能发生在object变得无效时.library cache pin的参数如下,有用的主要是P1和P2:
P1 – KGL Handle address.
P2 – Pin address
P3 – 10*Mode + Namespace
其中,P1,P2可与x$kglpn和x$kglob表相关.x$kglpn和x$kglob是ORACLE数据库的内部数据字典.
x$kglpn library cache pin信息
x$kglob library cache object信息

查询方法一

--通过查询V$SESSION_WAIT找出正在等待”library cache pin”的session
SELECT sid,
       SUBSTR (event, 1, 30),
       TO_CHAR(p1, 'xxxxxxxx') p1_16,
       --P1RAW P1_16,
       p2,
       p3
  FROM v$session_wait
 WHERE wait_time = 0 AND event LIKE 'library cache pin%';
--P1 列是Library Cache Handle Address
--P2 列是Library Cache Pin Address. 

--找到相关session pin状态
SELECT ADDR,
       INDX,
       KGLPNADR,-- Library Cache Pin Address
       KGLPNUSE,
       KGLPNSES,--识别锁住此pin 的session
       KGLPNHDL,--Library Cache Handle Address
       kGLPNLCK,
       KGLPNMOD,-- Pin 锁 
       KGLPNREQ-- Pin 请求
  FROM x$kglpn
 WHERE KGLPNHDL LIKE '%EB3EB8%';--p1_16
 
 --询X$KGLOB (Library Cache Object),可找到相关的object
SELECT KGLNAOBJ-- 相关object的名字(取前面80个字符)
  FROM X$KGLOB
 WHERE KGLHDADR LIKE '%EB3EB8%';--p1_16
 
 --查出占着pin锁的session目前正在做什么
SELECT a.sid, a.username, a.program
  FROM v$session a, x$kglpn b
 WHERE a.saddr = b.kglpnuse AND b.kglpnhdl LIKE '%EB3EB8%'--p1_16
  AND b.kgnmod <> 0;
  
   --查出阻塞者正执行的SQL语句
 SELECT sid, sql_text
  FROM v$session, v$sqlarea
 WHERE v$session.sql_address = v$sqlarea.address AND sid =&sid;

查询方法二

--通过查询DBA_LOCK_INTERNAL和V$SESSION_WAIT,可得到与”library cache pin” 等待相关的object的名字
SELECT TO_CHAR (SESSION_ID, '999') sid,
       SUBSTR (LOCK_TYPE, 1, 30) TYPE,
       SUBSTR (lock_id1, 1, 23) Object_Name,
       SUBSTR (mode_held, 1, 4) HELD,
       SUBSTR (mode_requested, 1, 4) REQ,
       lock_id2 Lock_addr
  FROM dba_lock_internal
 WHERE mode_requested <> 'None' AND mode_requested <> mode_held
       AND session_id IN
              (SELECT sid
                 FROM v$session_wait
                WHERE wait_time = 0 AND event LIKE 'library cache pin%');
                
 --查出”library cache pin”占有者(即阻塞者)的session id
 SELECT sid Holder,
       KGLPNUSE Sesion,
       KGLPNMOD Held,
       KGLPNREQ Req
  FROM sys.x$kglpn, v$session
 WHERE KGLPNHDL IN (SELECT p1raw
                      FROM v$session_wait
                     WHERE wait_time = 0 AND event LIKE 'library cache pin%')
       AND KGLPNMOD <> 0
       AND v$session.saddr = x$kglpn.kglpnuse;
       
 --查出”library cache pin”占有者(阻塞者)正在等什么
 SELECT sid, SUBSTR (event, 1, 30), wait_time
  FROM v$session_wait
 WHERE sid IN
          (SELECT sid
             FROM x$kglpn, v$session
            WHERE KGLPNHDL IN
                     (SELECT p1raw
                        FROM v$session_wait
                       WHERE wait_time = 0
                             AND event LIKE 'library cache pin%')
                  AND KGLPNMOD <> 0
                  AND v$session.saddr = x$kglpn.kglpnuse);
                  
 --查出阻塞者正执行的SQL语句
 SELECT sid, sql_text
  FROM v$session, v$sqlarea
 WHERE v$session.sql_address = v$sqlarea.address AND sid =&sid;

发表在 Oracle性能优化 | 标签为 library cache pin | 评论关闭

cursor: pin S wait on X 等待事件

发表于 2012 年 5 月 16 日由惜分飞

cursor: pin S整体描述

cursor: pin S A session waits on this event when it wants to update a shared mutex pin and another session 
is currently in the process of updating a shared mutex pin for the same cursor object. This wait event should
rarely be seen because a shared mutex pin update is very fast.(Wait Time: Microseconds)

--Parameter说明
P1 Hash value of cursor

P2 Mutex value 
64 bit platforms
8 bytes are used.
Top 4 bytes hold the session id (if the mutex is held X)
Bottom 4 bytes hold the ref count (if the mutex is held S).

32 bit platforms 
4 bytes are used.
Top 2 bytes hold the session id (if the mutex is held X) 
Bottom 2 bytes hold the ref count (if the mutex is held S).

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

--查询sql
SELECT a.*, s.sql_text
  FROM v$sql s,
       (SELECT sid,
               event,
               wait_class,
               p1 cursor_hash_value,
               p2raw Mutex_value,
               TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid
          FROM v$session_wait
         WHERE event LIKE 'cursor%') a
 WHERE s.HASH_VALUE = a.cursor_hash_value

cursor: pin S wait on X描述

- In previous versions of Oracle, library cache pin is protected by “library cache pin latch”.
- But in recent versions of Oracle(I believe it’s 10.2.0.2), 
  library cache pin for the cursor LCO is protected by mutext.
- Mutex is allocated per LCO, so it enables fine-grained access control.

“cursor: pin S wait on X” wait event is mostly related to mutex and hard parse.
- When a process hard parses the SQL statement, it should acquire exclusive 
  library cache pin for the corresponding LCO.
- This means that the process acquires the mutex in exclusive mode.
- Another process which also executes the same query needs to acquire the mutex 
  but it’s being blocked by preceding process. The wait event is “cursor: pin S wait on X”.

--发生cursor: pin S wait on X原因
Frequent Hard Parses
If the frequency of Hard Parsing is extremely high, then contention can occur on this pin.

High Version Counts
When Version counts become excessive, a long chain of versions needs to 
be examined and this can lead to contention on this event

Known bugs
Bug 5907779 - Self deadlock hang on "cursor: pin S wait on X" (typically from DBMS_STATS) [ID 5907779.8]
Bug 7568642: BLOCKING_SESSION EMPTY FOR "CURSOR: PIN S WAIT ON X"

发表在 Oracle性能优化 | 标签为 cursor: pin S wait on X | 评论关闭

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

发表于 2012 年 5 月 15 日由惜分飞

open数据库alert日志报ORA-00600[kcratr1_lostwrt]错误

Mon May 14 14:57:28 2012
ALTER DATABASE OPEN
Mon May 14 14:57:29 2012
Beginning crash recovery of 1 threads
Mon May 14 14:57:29 2012
Started redo scan
Mon May 14 14:57:29 2012
Errors in file d:\oracle\admin\cqgasold\udump\cqgasold_ora_504.trc:
ORA-00600: 内部错误代码，参数: [kcratr1_lostwrt], [], [], [], [], [], [], []

ORA-600 signalled during: alter database open...

查询相关SCN
同一个查询中SCN相同,省略

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;

未选定行

SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;

     FILE# SCN              STOP_SCN                                            
---------- ---------------- ----------------                                    
         1       7842987188                                                     
         2       7842987188                                                     
         3       7842987188                                                     

SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;

     FILE# SCN               RESETLOGS SCN                                      
---------- ----------------- -----------------                                  
         1        7842991811                 1                                  
         2        7842991811                 1                                  
         3        7842991811                 1

这里看到奇怪现象datafile scn小于datafile_header scn,数据库异常断电一般来说也不会出现这样的情况,个人猜测是错误的恢复或者使用历史控制文件导致,对于这样的现状,我先尝试着使用using backup controlfile方式恢复,结果失败.估计控制文件有异常,本着先拉起库原则,重建控制文件.

进行完全恢复

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码，参数: [3020], [8388617], [1], [23403], [25], [112],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'

尝试跳过坏块继续恢复

SQL> recover database allow 1 corruption;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码，参数: [3020], [8388610], [1], [23403], [2264], [16],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 2)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTFB Bitmapped File Space Header'

使用dbv检查坏块数量

C:\>dbv file='d:\oracle\oradata\cqgasold\undo_1.dbf' blocksize=8192

DBVERIFY: Release 9.2.0.5.0 - Production on 星期二 5月 15 19:43:42 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - 验证正在开始 : FILE = d:\oracle\oradata\cqgasold\undo_1.dbf

DBV-00200: 块, dba 8388617, 已经标记为崩溃
汇入的页369 - 可能是介质损坏
***
Corrupt block relative dba: 0x00800171 (file 2, block 369)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x00800171
 last change scn: 0x0001.d37c57db seq: 0x1 flg: 0x04
 consistency value in tail: 0x4e930260
 check value in block header: 0x8202, computed block checksum: 0x4e5f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***

汇入的页417 - 可能是介质损坏
***
Corrupt block relative dba: 0x008001a1 (file 2, block 417)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x008001a1
 last change scn: 0x0001.d37c53d4 seq: 0x2 flg: 0x04
 consistency value in tail: 0x4b6b0201
 check value in block header: 0x6ae7, computed block checksum: 0x5abc
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
…………
--类此记录很多,我放弃了跳过坏块修复的方法

恢复过程中提示坏块数据库文件离线恢复

SQL> alter database datafile 'd:\oracle\oradata\cqgasold\undo_1.dbf' offline;

数据库已更改。

SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00376: 此时无法读取文件 2
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'

到了这一步,根据经验,数据库被open的可能性很多了,很有可能是open以后因为smon回滚导致数据库down

查看日志,屏蔽回滚段,完成恢复

Tue May 15 19:59:52 2012
alter database open
Tue May 15 19:59:52 2012
Beginning crash recovery of 1 threads
Tue May 15 19:59:52 2012
Started redo scan
Tue May 15 19:59:52 2012
Completed redo scan
 323 redo blocks read, 82 data blocks need recovery
Tue May 15 19:59:52 2012
Started recovery at
 Thread 1: logseq 23404, block 3, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 23404 Reading mem 0
  Mem# 0 errs 0: F:\ORACLE\ORADATA\LOGCQGASOLD4.ORA
Tue May 15 19:59:52 2012
Completed redo application
Tue May 15 19:59:52 2012
Ended recovery at
 Thread 1: logseq 23404, block 326, scn 1.3548264979
 82 data blocks read, 82 data blocks written, 323 redo blocks read
Crash recovery completed successfully
Tue May 15 19:59:53 2012
Thread 1 advanced to log sequence 23405
Thread 1 opened at log sequence 23405
  Current log# 2 seq# 23405 mem# 0: D:\ORACLE\ORADATA\CQGASOLD\REDO02.LOG
Successful open of redo thread 1
Tue May 15 19:59:53 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 15 19:59:54 2012
Database Characterset is ZHS16GBK
Tue May 15 19:59:55 2012
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-604 signalled during: alter database open...
Tue May 15 19:59:56 2012
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 8
SMON: mark undo segment 8 as needs recovery
SMON: about to recover undo segment 9
SMON: mark undo segment 9 as needs recovery
SMON: about to recover undo segment 10
SMON: mark undo segment 10 as needs recovery
Tue May 15 20:00:37 2012
Shutting down instance (abort)

看到这里,可以大概确定是因为undo文件离线,导致回滚段异常.
这个问题,基本上可以确定通过隐含参数屏蔽回滚段,然后open数据库,重建undo删除异常undo,数据库恢复完成。

发表在 Oracle备份恢复 | 标签为 kcratr1_lostwrt, ORA-00600 kcratr1_lostwrt, ORA-600 3020, ORA-600[3020] | 4 条评论

library cache pin等待事件

cursor: pin S wait on X 等待事件

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

文章分类

最近发表

library cache pin等待事件

cursor: pin S wait on X 等待事件

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

标签云

文章分类

最近发表