oracle open hang 等待cursor: pin S wait on X

客户19.3数据库无法在open过程hang住
20221018151321


分析alert日志

2022-10-18T15:04:57.374918+08:00
db_recovery_file_dest_size of 102400 MB is 9.58% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
2022-10-18T15:09:55.535116+08:00
ORCLPDB(4):>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=40
ORCLPDB(4):System State dumped to trace file /data/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p000_31225.trc
2022-10-18T15:19:33.374783+08:00
ORCLPDB(4):Undo initialization recovery: err:1013 start: 1911760 end: 2790176 diff: 878416 ms (878.4 seconds)
Pdb ORCLPDB hit error 1013 during open read write (1) and will be closed.

这里比较明显,cdb本身open正常,但是其中的ORCLPDB这个pdb无法open,从而显示hang的情况.查询数据库会话情况

SQL> select event,sql_id from v$session where wait_class#<>6;

EVENT								 SQL_ID
---------------------------------------------------------------- -------------
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh

EVENT								 SQL_ID
---------------------------------------------------------------- -------------
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
row cache lock							 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
SQL*Net message to client					 1dhc13tspcmys
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh

EVENT								 SQL_ID
---------------------------------------------------------------- -------------
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh
cursor: pin S wait on X 					 8vyjutx6hg3wh

33 rows selected.

SQL> select sql_text from v$sql where sql_id='8vyjutx6hg3wh';

SQL_TEXT
--------------------------------------------------------------------------------
update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undo
sqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1

SQL> col machine for a30
SQL> /

   INST_ID	  SID PADDR	       SQL_ID	     EVENT			    MACHINE	 PROGRAM
---------- ---------- ---------------- ------------- ------------------------------ ------------ ------------------------
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)

   INST_ID	  SID PADDR	       SQL_ID	     EVENT			    MACHINE	 PROGRAM
---------- ---------- ---------------- ------------- ------------------------------ ------------ -----------------------
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 4517 00000001907FEC50 8vyjutx6hg3wh row cache lock		    xifenfei	 oracle@xifenfei (P000)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)

   INST_ID	  SID PADDR	       SQL_ID	     EVENT			    MACHINE	 PROGRAM
---------- ---------- ---------------- ------------- ------------------------------ ------------ -----------------------
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)
	 1	 5645 000000018095AF40 8vyjutx6hg3wh cursor: pin S wait on X	    xifenfei	 oracle@xifenfei (P002)

31 rows selected.

SQL> l
  1  select b.INST_ID,b.sid,b.paddr,b.sql_id,b.event,b.MACHINE,b.PROGRAM from gv$session a,gv$session b
  2* where a.event='cursor: pin S wait on X'  and a.FINAL_BLOCKING_INSTANCE=b.INST_ID and
  3* a.FINAL_BLOCKING_SESSION=b.sid
SQL> 

通过上述分析,可以确认是在open pdb的过程中被cursor: pin S wait on X等待事件阻塞,而被阻塞的sql是update /*+ rule */ undo$ set …………,基于这样的情况.大概率可以确认是由于bug导致.通过查询mos,确认和Bug 30931981 – Open Reset Logs Hangs With ‘row cache lock’ and ‘cursor: pin s wait for x’ Waits (Doc ID 30931981.8)类似.
20221018200418


不过由于客户的版本是19.3,没有对应的小patch发布.通过对相关恢复事务和恢复方式进行处理,在没有对数据库版本进行任何调整的情况下,顺利打开数据库以最快的速度恢复业务
20221018200716

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭

19c adg kill进程报ORA-600 ktuGetTemprsp:no tso

今日一客户19.5的adg环境由于负载过高kill进程之后,导致dg库直接实例crash,报ORA-600 ktuGetTemprsp:no tso错误

2022-10-13T09:54:58.769194+08:00
Recovery of Online Redo Log: Thread 1 Group 12 Seq 84142 Reading mem 0
  Mem# 0: /oracle/oradata/xff1/std_redo12.log
2022-10-13T09:56:51.312075+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl08_145506.trc  (incident=1294241):
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [396], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:51.324562+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl02_145494.trc  (incident=1292913):
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [3447], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:51.324621+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl05_145500.trc  (incident=1294129):
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [204], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:51.336288+08:00
……………………
2022-10-13T09:56:51.708427+08:00
PMON (ospid: 546859): terminating the instance due to ORA error 12752
2022-10-13T09:56:51.764031+08:00
Cause - 'Instance is being terminated due to fatal process death (pid: 1169, ospid: 145502, CL06)'
2022-10-13T09:56:51.841454+08:00
System state dump requested by (instance=1, osid=546859 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_diag_546895.trc
2022-10-13T09:56:52.030755+08:00
opiodr aborting process unknown ospid (61300) as a result of ORA-1092
2022-10-13T09:56:54.073403+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl08_145506.trc:
ORA-12752: An instance-critical process has terminated.
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [396], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:54.093199+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl02_145494.trc:
ORA-12752: An instance-critical process has terminated.
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [3447], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:54.219531+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff1dg/xff1/trace/xff1_cl07_145504.trc:
ORA-12752: An instance-critical process has terminated.
ORA-00600: internal error code, arguments: [ktuGetTemprsp:no tso], [370], [], [], [], [], [], [], [], [], [], []
2022-10-13T09:56:54.281477+08:00
opidrv aborting process CL08 ospid (145506) as a result of ORA-12752
2022-10-13T09:56:54.288978+08:00
opidrv aborting process CL07 ospid (145504) as a result of ORA-12752
2022-10-13T09:56:54.380794+08:00
opidrv aborting process CL02 ospid (145494) as a result of ORA-12752
2022-10-13T09:57:13.447674+08:00
Instance terminated by PMON, pid = 546859

通过mos分析确认是由于Bug 31443748 – ADG Instance Crashes With ORA-600[ktugettemprsp:no tso] (Doc ID 31443748.8)导致
20221013220330


可以打上patch 31956113进行修复该问题或者直接升级到19.11.0.0.DBRU:210420 (APR 2021) DB Release Update(DB RU)也修复该问题

发表在 Data Guard | 标签为 | 评论关闭

ORA-00257: archiver error的另外一种原因

今天遇到一个相对特殊的案例,拿出来和大家分享,数据库报错为不能归档ORA-00257

SQL> conn system/xxxxxx
ERROR:
ORA-00257: archiver error. Connect internal only, until freed.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /oracle/PRD/oraarch
Oldest online log sequence     11479
Current log sequence           11481

alert日志报错

sapprddb1:oraprd 73> tail -f alert_prd.log
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Wed Oct 12 09:27:41 2022
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Wed Oct 12 09:27:41 2022
Master archival failure: 19502

该错误的含义

sapprddb1:oraprd 74>oerr ora 19502
19502, 00000, "write error on file \"%s\", block number %s (block size=%s)"
// *Cause:  write error on output file
// *Action: check the file

从报错初步看是由于归档目录空间满了导致,查看发现归档空间剩余很多

sapprddb1:/oracle/PRD/sapdata1 # df -h
Filesystem                          Size  Used Avail Use% Mounted on
/dev/sda1                            99G   25G   70G  26% /
udev                                253G  240K  253G   1% /dev
tmpfs                               426G   72K  426G   1% /dev/shm
/dev/sda3                           388G   12G  357G   4% /backup
/dev/mapper/SAPVG-oraclelv          197G   22G  165G  12% /oracle
/dev/mapper/SAPVG-mirrlogA           20G  773M   18G   5% /oracle/PRD/mirrlogA
/dev/mapper/SAPVG-mirrlogBlv         20G  773M   18G   5% /oracle/PRD/mirrlogB
/dev/mapper/SAPVG-oraarchlv         180G  2.2G  169G   2% /oracle/PRD/oraarch    <----剩余很多
/dev/mapper/SAPVG-origlogAlv         20G  894M   18G   5% /oracle/PRD/origlogA
/dev/mapper/SAPVG-origlogBlv         20G  894M   18G   5% /oracle/PRD/origlogB
/dev/mapper/SAPVG-sapdata1lv        591G  561G     0 100% /oracle/PRD/sapdata1
/dev/mapper/SAPVG-sapdata2lv        1.4T  1.4T     0 100% /oracle/PRD/sapdata2
/dev/mapper/SAPVG-sapdata3lv        788G  748G     0 100% /oracle/PRD/sapdata3
/dev/mapper/SAPVG-sapdata4lv        788G  748G     0 100% /oracle/PRD/sapdata4
sapprd:/sapmnt/PRD                   30G   12G   17G  42% /sapmnt/PRD
sapprd:/usr/sap/trans               105G 1005M   99G   1% /usr/sap/trans
/dev/mapper/VGSAP2-lvsapdata5       1.5T  340G  1.1T  25% /oracle/PRD/sapdata5
/dev/mapper/VGSAP2-lvsapdata6       1.5T  706G  696G  51% /oracle/PRD/sapdata6

尝试人工归档

SQL> alter system archive log current
  2  /
alter system archive log current
*
ERROR at line 1:
ORA-19502: write error on file "/oracle/PRD/sapdata1/cntrl/cntrlPRD.dbf", block
number 4837 (block size=16384)
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 28: No space left on device
Additional information: -1
Additional information: 442368

到这里基本上明确了,由于控制文件所在的分区磁盘使用100%,导致归档的时候无法写如记录到控制文件从而导致数据库报ORA-00257错误,对/oracle/PRD/sapdata1中的某个文件进行稍微收缩,数据库恢复正常

发表在 ORA-xxxxx | 标签为 , | 评论关闭