联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-01171: datafile N going offline due to error advancing checkpoint
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
最近接到一个客户有一个数据文件offline的恢复咨询,通过分析日志,当时是由于在启动的时候数据文件被占用导致后续数据库open之后,该文件被强制offline掉
Fri May 16 20:01:05 2025 Database mounted in Exclusive Mode Completed: ALTER DATABASE MOUNT Fri May 16 20:01:05 2025 ALTER DATABASE OPEN Fri May 16 20:01:06 2025 LGWR: STARTING ARCH PROCESSES ARC0 started with pid=70, OS id=4628 Fri May 16 20:01:06 2025 ARC0: Archival started ARC1 started with pid=74, OS id=4840 Fri May 16 20:01:06 2025 ARC1: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE Fri May 16 20:01:06 2025 Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_4080.trc: ORA-01110: data file 14: 'D:\ORADATA\XIFENFEI105_DAT_1.DBF' ORA-01114: IO error writing block to file 14 (block # 1) ORA-27041: unable to open file OSD-04002: 无法打开文件 O/S-Error: (OS 32) 另一个程序正在使用此文件,进程无法访问。 Thread 1 opened at log sequence 172421 Current log# 1 seq# 172421 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG Fri May 16 20:01:06 2025 ARC1: STARTING ARCH PROCESSES Fri May 16 20:01:06 2025 Successful open of redo thread 1 Fri May 16 20:01:06 2025 ARC0: Becoming the 'no FAL' ARCH ARC0: Becoming the 'no SRL' ARCH Fri May 16 20:01:06 2025 ARC2: Archival started ARC1: STARTING ARCH PROCESSES COMPLETE ARC2 started with pid=78, OS id=4056 Fri May 16 20:01:06 2025 ARC1: Becoming the heartbeat ARCH Fri May 16 20:01:06 2025 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri May 16 20:01:06 2025 SMON: enabling cache recovery Fri May 16 20:01:07 2025 Successfully onlined Undo Tablespace 1. Fri May 16 20:01:07 2025 SMON: enabling tx recovery Fri May 16 20:01:08 2025 Database Characterset is ZHS16GBK replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC QMNC started with pid=86, OS id=4492 Fri May 16 20:01:12 2025 db_recovery_file_dest_size of 51200 MB is 1.97% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Fri May 16 20:01:13 2025 Completed: ALTER DATABASE OPEN Fri May 16 20:06:44 2025 Restarting dead background process MMON MMON started with pid=98, OS id=4232 Fri May 16 20:07:06 2025 Shutting down archive processes Fri May 16 20:07:11 2025 ARCH shutting down ARC2: Archival stopped Fri May 16 20:10:32 2025 Thread 1 advanced to log sequence 172422 Current log# 2 seq# 172422 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG Fri May 16 20:15:33 2025 Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_ckpt_2496.trc: ORA-01171: datafile 14 going offline due to error advancing checkpoint ORA-01122: database file 14 failed verification check ORA-01110: data file 14: 'D:\ORADATA\XIFENFEI105_DAT_1.DBF' ORA-01208: data file is an old version - not accessing current version Fri May 16 20:23:09 2025 Starting background process EMN0 EMN0 started with pid=82, OS id=2660
本身这个故障相对比较简单,只要归档存在直接recover datafile,然后online即可,但是由于备份软件定时工作,导致对应的归档被备份走
Fri May 16 21:55:10 2025 Control autobackup written to SBT_TAPE device comment 'API Version 2.0,MMS Version 10.0.0.116', media 'V_6746190_6959024' handle 'c-1300253653-20250516-00' Fri May 16 21:56:03 2025 Thread 1 cannot allocate new log, sequence 172423 Private strand flush not complete Current log# 2 seq# 172422 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
而且被异常的数据文件不是核心业务文件,导致客户没有及时发现,等到发现之时尝试recover datafile,提示缺少归档
Wed May 28 17:26:01 2025 alter database recover datafile list clear Wed May 28 17:26:01 2025 Completed: alter database recover datafile list clear Wed May 28 17:26:01 2025 alter database recover if needed datafile 14 Media Recovery Start parallel recovery started with 16 processes ORA-279 signalled during: alter database recover if needed datafile 14 ... Wed May 28 17:26:11 2025 alter database recover cancel Wed May 28 17:26:13 2025 Media Recovery Canceled Completed: alter database recover cancel Wed May 28 17:38:58 2025 ALTER DATABASE RECOVER datafile 'D:\ORADATA\XIFENFEI105_DAT_1.DBF' Wed May 28 17:38:58 2025 Media Recovery Start parallel recovery started with 16 processes ORA-279 signalled during: ALTER DATABASE RECOVER datafile 'D:\ORADATA\XIFENFEI105_DAT_1.DBF' ... Wed May 28 18:26:37 2025 ALTER DATABASE RECOVER CONTINUE DEFAULT Wed May 28 18:26:38 2025 Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC Errors with log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC ORA-308 signalled during: ALTER DATABASE RECOVER CONTINUE DEFAULT ... Wed May 28 18:26:38 2025 ALTER DATABASE RECOVER CONTINUE DEFAULT Wed May 28 18:26:38 2025 Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC Errors with log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2025_05_28\O1_MF_1_172421_%U_.ARC ORA-308 signalled during: ALTER DATABASE RECOVER CONTINUE DEFAULT ... Wed May 28 18:26:38 2025 ALTER DATABASE RECOVER CANCEL Wed May 28 18:26:40 2025 Media Recovery Canceled Completed: ALTER DATABASE RECOVER CANCEL
这个客户运气还不错,带库中的需要恢复的归档日志都还在,通过指定带库通道,直接recover datafile成功
RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE 'sbt_tape' PARMS="BLKSIZE=262144,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch1)"; ALLOCATE CHANNEL ch2 DEVICE TYPE 'sbt_tape' PARMS="BLKSIZE=262144,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch2)"; recover datafile 14; }
至此完美解决该问题,通过这个case,的出来的经验有:
1. 数据库重启之后,要检查数据库日志和查询数据库数据文件状态(主要防止一些不太常用的文件异常,不能及时发现)
2. 需要需要数据库的基本情况,比如备份,容灾,asm磁盘组冗余,存储冗余,网络冗余等情况,这样出现问题好排查解决