OSD-04016: 异步 I/O 请求排队时出错

有某客户由于硬件故障,导致数据库无法启动,让我们介入处理
数据库启动报错

Mon Feb 26 17:28:24 2018
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 2054 KB redo, 509 data blocks need recovery
Started redo application at
 Thread 1: logseq 41341, block 54
Recovery of Online Redo Log: Thread 1 Group 1 Seq 41341 Reading mem 0
  Mem# 0: E:\ORADATA\ORCL\REDO01.LOG
Completed redo application of 1.77MB
KCF: read, write or open error, block=0x16439 online=1
        file=1 'E:\ORADATA\ORCL\SYSTEM01.DBF'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_dbw0_4928.trc:
ORA-01243: system tablespace file suffered media failure
ORA-01114: IO error writing block to file 1 (block # 91193)
ORA-01110: data file 1: 'E:\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
DBW0 (ospid: 4928): terminating the instance due to error 1243
Mon Feb 26 17:28:29 2018
Instance terminated by DBW0, pid = 4928

20180227223540


这里错误比较明显,由于io错误,在数据库实例恢复之时,写回block正好在该损坏位置从而使得数据库无法正常实例恢复,进而无法open.

dbv验证文件
通过专业工具对system文件进行了重构system文件,然后dbv检查结果如下


DBVERIFY: Release 11.2.0.1.0 - Production on 星期二 2月 27 17:20:14 2018

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - 开始验证: FILE = D:\OK\SYSTEM01.DBF
页 91156 标记为损坏
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during dbv: 
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689


DBVERIFY - 验证完成

检查的页总数: 215040
处理的页总数 (数据): 178112
失败的页总数 (数据): 0
处理的页总数 (索引): 19070
失败的页总数 (索引): 0
处理的页总数 (其他): 3118
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 14739
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 1638554501 (0.1638554501)

这里比较明显重构出来的system文件只有block 91156坏块,这里注意和没有处理之前的坏块不一样

通过dump分析坏块所属对象

Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91156 maxblk 91156
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0xe1d9e3e7 (903/1696743)
scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c tail: 0x71f50602
frmt: 0x06 chkval: 0x8195 type: 0xe5=unknown
Hex dump of corrupt header 4 = CORRUPT

Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91155 maxblk 91155
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416413 (1/91155)
scn: 0x0000.613ad8d6 seq: 0x01 flg: 0x06 tail: 0xd8d60601
frmt: 0x02 chkval: 0xcc5f type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

Block header dump:  0x00416413
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.613ad8ce  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.000.0016fda0  0x00c0130e.6f38.01  CB--    0  scn 0x0000.3e6ed294
0x02   0x0009.008.0023e862  0x00c002cf.a217.13  --U-    1  fsc 0x0000.613ad8d6


Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91157 maxblk 91157
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416415 (1/91157)
scn: 0x0000.6193dc0c seq: 0x01 flg: 0x06 tail: 0xdc0c0601
frmt: 0x02 chkval: 0x8c21 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

Block header dump:  0x00416415
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.6193dc04  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00c.001ee103  0x00c008a5.8dbd.02  C---    0  scn 0x0000.57303e03
0x02   0x000b.00f.00053100  0x00c008af.1563.09  --U-    1  fsc 0x0000.6193dc0c

这里比较明显,可以确定坏块为index,object_id=0×25=37,通过查询其他库,确定为i_obj2(obj$的index)

使用该文件启动数据库

SQL> alter database rename file 'E:\ORADATA\ORCL\SYSTEM01.DBF' to 'd:\orcl\SYSTEM01.DBF';

数据库已更改。

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

检查alert日志
发现smon进程由于坏块的存储,出现大量报错,需要处理,不然数据库一段时间后就会crash.

Tue Feb 27 20:31:23 2018
QMNC started with pid=27, OS id=4652 
Completed: alter database open
Tue Feb 27 20:31:25 2018
Starting background process CJQ0
Tue Feb 27 20:31:25 2018
CJQ0 started with pid=30, OS id=2172 
Tue Feb 27 20:31:25 2018
db_recovery_file_dest_size of 3912 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Hex dump of (file 1, block 91156) in trace file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during multiblock buffer read
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689
Reading datafile 'D:\ORCL\SYSTEM01.DBF' for corruption at rdba: 0x00416414 (file 1, block 91156)
Reread (file 1, block 91156) found same corrupt data
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_smon_3992.trc  (incident=77085):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77085\orcl_smon_3992_i77085.trc
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc  (incident=77221):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77221\orcl_cjq0_2172_i77221.trc

重建i_obj2 index,参考://www.xifenfei.com/?p=5566

发表在 Oracle备份恢复 | 标签为 , , | 评论关闭

18c新特性:alter system cancel sql

根据18c官方描述cancel sql功能是在18c中引起,但是实测发现在oracle 12.2中已经有了cancel sql功能,可以实现终止掉某个sql的当前sql正在执行的sql语句,而不是传统的直接kill某个会话.ALTER SYSTEM CANCEL SQL语句有四个参数分别为:
cancel_sql

--会话1
SQL> set lines 150
SQL> select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0
PL/SQL Release 12.2.0.1.0 - Production                                                    0
CORE    12.2.0.1.0      Production                                                        0
TNS for Linux: Version 12.2.0.1.0 - Production                                            0
NLSRTL Version 12.2.0.1.0 - Production                                                    0


SQL> select sid, serial# from v$session where sid in
  2  (select  sid from v$mystat where rownum=1);

       SID    SERIAL#
---------- ----------
       278       4019

SQL> create table t_xifenfei tablespace users as select * from dba_source;

Table created.

SQL> insert into t_xifenfei select * from t_xifenfei;

274132 rows created.                    <<===没有提交

SQL> select count(*)from t_xifenfei;

  COUNT(*)
----------
    548264

SQL> insert into t_xifenfei select * from t_xifenfei;

548264 rows created.     <<===没有提交

SQL> select count(*)from t_xifenfei;

  COUNT(*)
----------
   1096528

SQL> insert into t_xifenfei select * from t_xifenfei;


--会话2
SQL> select count(*)from t_xifenfei;

  COUNT(*)
----------
    274132

SQL> alter system cancel sql '278,4019';

System altered.

SQL> select count(*)from t_xifenfei;

  COUNT(*)
----------
    274132

--会话1
SQL> insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

SQL> select count(*)from t_xifenfei;

  COUNT(*)
----------
   1096528

这里可以看到会话1的最后一个insert被cancel,但是前面两个没有提交的insert没有被回滚/提交,看到了cancel sql的功能的实现.

发表在 ORACLE 12C, ORACLE 18C | 标签为 , , | 评论关闭

Oracle中的主要字母缩写含义

ACD = Active Change Directory
ACFS = ASM Cluster File System
ADDM = Automatic Database Diagnostic Monitor
ADR = Automatic Diagnostic Repository
ADVM = ASM Dynamic Volume Manager
AIO = Asynchronous I/O
AMDU = ASM Metadata Dump Utility
AMM = Automatic Memory Management
ARC = Archive process
ASH = Active Session History
ASM = Automatic Storage Management
ASMB = ASM Background process
ASMCA = ASM Configuration Assistant
ASMCMD = ASM CoMmanD line utility
ASMLIB = ASM LIBrary tool
ASMM = Automatic Shared Memory Management
ASMSNMP = ASM Simple Network Management Protocol
AT = Allocation Table
ATA = Advanced Technology Attachment
AU = Allocation Unit
AWR = Automatic Workload Repository
BH = Block Header
BMC = Baseboard Management Controller
BS = Block Size
CBO = Cost-Based Optimizer
CF = Control File
CFS = Cluster FileSystem
CHM = Cluster Health Monitor
CIO = Concurrent I/O
CKPT = ChecKPoinT process
CLUVFY = CLUster VeriFy utility
COD = Continuing Operation Directory
CPU = Central Processing Unit or Critical Patch Update
CRM = Customer Relationship Management
CRS = Cluster Ready Services
CSS = Cluster Synchronization Services
CSV = Comma-Separated Values
CVM = Cluster Volume Manager
DB = DataBase
DBCA = DataBase Configuration Assistant
DBFS = DataBase File System
DBM = DataBase Machine
DBMS = DataBase Management Systems
DBPERF = DataBase PERFormance
DBV = DataBase Verification tool
DBW = DataBase Writer process
DD = Disk Directory or Data Description tool
DES = Database Excelleration Systems Inc.
DG = DiskGroup
DH = Disk Header
DIO = Direct I/O
DISM = Dynamic Intimate Shared Memory
DM = Device Mapper
DNFS = Direct Network File System
DRAM = Dynamic Random Access Memory
DSS = Decision Support System
DUL = Data UnLoader
DW = Data Warehouse
EIDE = Enhanced Integrated Drive Electronics
ERP = Enterprise Resource Planning
ETA = Estimated Time of Arrival
ETL = Extract Transform Load
FD = File Directory
FG = FailGroup
FRA = Flash Recovery Area or Fast Recovery Area
FS = FileSystem
FST = Free Segments Table
FTS = Full Table Scan
GC = Grid Control
GI = Grid Infrastructure
GUI = Graphical User Interface
HA = High Availability
HARD = Hardware Assisted Resilient Data
HB = Heart Beat
HBA = Host Bus Adapter
IDE = Integrated Drive Electronics
IIS = Internet Information Services
INST = INSTance
IO = Input/Output
IOPS = IO Per Second
IOT = Index Organized Table
IP = Internet Protocol
IPMI = Intelligent Platform Management Interface
JET = Joint Escalation Team
JFS = Journaled FileSystem
KB = KiloByte
KFED = Kernel Files metadata EDitor
KFOD = Kernel Files Osm disk[group] Discovery
LGWR = redo LoG WRiter process
LLT = Veritas Low Latency Transport protocol
LSNR = LiSteNeR
LUN = Logical Unit Number
LVM = Logical Volume Manager
MAA = Maximum Availability Architecture
MB = MegaByte
NAS = Network Attached Storage
NetApp = Network Appliance
NETCA = NETwork Configuration Assistant
NFS = Network FileSystem
NIC = Network Interface Controller
OCFS = Oracle Cluster FileSystem
OCR = Oracle Cluster Registry
ODM = Oracle Disk Manager
ODS = Operational Data Store
OEL = Oracle Enterprise Linux
OEM = Oracle Enterprise Manager
OID = Oracle Internet Directory
OLAP = OnLine Analytical Processing
OLTP = OnLine Transaction Processing
OMS = Oracle Management Service
OPATCH = Oracle PATCHing utility
OS = Operating System
OSCP = Oracle Storage Compatibility Program
OSW = OS Watcher
OSWFW = OS Watcher For Windows
OTN = Oracle Technology Network
OUI = Oracle Universal Installer
PB = PetaByte
PFILE = Parameter FILE
PGA = Program Global Area
PID = Proces ID
PL/SQL = Procedural Language/Structured Query Language
POC = Proof Of Concept
PROCWATCHER = PROCess WATCHER
PST = Partnership Status Table
PSU = Patch Set Update
PX = Parrallel eXecution
RAC = Real Application Cluster
RAID = Redundant Array of Independent Disks
RAM = Random Access Memory
RBAL = ReBALance process
RCA = Root Cause Analysis
RDA = Remote Diagnostic Agent
RDBMS = Relational DataBase Management System
RHEL = RedHat Enterprise Linux
RM = Resource Manager
RMAN = Recovery MANager
RPM = Resource Package Manager
SAN = Storage Area Network
SAS = Serial Attached SCSI
SATA = Serial Advanced Technology Attachment
SCAN = Single Client Access Name
SCSI = Small Computer System Interface
SGA = System Global Area
SLA = Service Level Agreement
SMF = Service Management Facility
SPFILE = Server Parameter FILE
SQL = Structured Query Language
SRDF = EMC Symmetrix Remote Data Facility
SSD = Solid State Disk
SVCTM = average SerViCe TiMe
SVM = Solaris Volume Manager
TB = TeraByte
TCP = Transmission Control Protocol
TDE = Transparent Data Encryption
TKPROF = Transient Kernel PROFile
TNS = Transparent Network Substrate
UDEV = Unix DEVice manager
UDP = User Datagram Protocol
UFG = Umbilicus ForeGround process
UFS = User FileSystem
VBG = Volume BackGround process
VCS = Veritas Cluster Server
VD = Voting Disk
VDBG = Volume Driver BackGround process
VIP = Virtual Internet Portocol
VLDB = Very Large DataBase
VM = Virtual Machine
VMB = Volume Membership BackGround process
VxFS = Veritas File System
VxVM = Veritas Volume Manager
XDB = XML DataBase

发表在 Oracle | 评论关闭