sysaux数据文件异常恢复

联系:手机(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:sysaux数据文件异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

案例背景:在我接手这个库之前,因为某种原因sysaux表空间的数据文件离线,该库非归档模式,无备份
一、sysaux数据文件离线原因

Mon Jun  7 03:03:22 2010
KCF: write/open error block=0x67009 online=1
     file=3 /opt/app/oracle/oradata/BAS/sysaux01.dbf
     error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 421897
Additional information: -1'
Automatic datafile offline due to write error on
file 3: /opt/app/oracle/oradata/BAS/sysaux01.dbf

因为该数据库是非归档模式,估计以前的dba也是一段时间后发现sysaux被离线,因为不是归档模式,无法恢复,就一直放置着,让库处于这样的状态中。

二、sysaux数据文件online
1、使用bbed修改sysaux数据文件的scn,见:bbed 修改datafile header

2、尝试online过程日志如下

Sat Dec 17 19:33:36 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:33:36 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:33:37 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:39:47 2011
ALTER DATABASE RECOVER  datafile 3  
Sat Dec 17 19:39:47 2011
Media Recovery Start
 parallel recovery started with 7 processes
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 6 Seq 13545 Reading mem 0
  Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0602.log
  Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0601.log
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 7 Seq 13546 Reading mem 0
  Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0702.log
  Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0701.log
Sat Dec 17 19:39:47 2011
Media Recovery Complete (BAS)
Completed: ALTER DATABASE RECOVER  datafile 3  
Sat Dec 17 19:39:58 2011
alter database datafile 3 online
Sat Dec 17 19:39:58 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:39:58 2011
Completed: alter database datafile 3 online

这个过程虽然在sqlplus中提示online成功,但是alert中的错误警告,以及smon进程占用100%的cup资源,最终导致数据库hang住。

2、分析alert日志和trace文件

alert日志中
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.

bas_smon_27197.trc中
[oracle@bas bdump]$ more /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
/opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      bas
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: BAS
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 27197, image: oracle@bas (SMON)

*** SERVICE NAME:() 2011-12-17 19:23:33.179
*** SESSION ID:(5490.1) 2011-12-17 19:23:33.179
SMON: about to recover undo segment 70
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
*** 2011-12-17 19:23:33.188
ksedmp: internal or fatal error
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'

通过这些证明smon在利用undo segment 70在回滚sysaux中的内容,但是因为某种原因该回滚段异常,不能进行回滚,是的smon一直尝试回滚,但是始终不成功,最后数据库hang住,需要解决sysaux的问题,首先需要解决这个回滚段问题(删除异常回滚段)

3、删除异常回滚段,online datafile 3

强制kill掉smon进程,重启数据库
[oracle@bas bdump]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Dec 17 19:52:14 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 8589934592 bytes
Fixed Size                  2034520 bytes
Variable Size            1124074664 bytes
Database Buffers         7398752256 bytes
Redo Buffers               65073152 bytes
Database mounted.

--为了数据库不hang掉,先offline datafile 3
SQL> alter database datafile 3 offline;

Database altered.

SQL> select segment_name,status from dba_rollback_segs;
select segment_name,status from dba_rollback_segs
                                *
ERROR at line 1:
ORA-01219: database not open: queries allowed on fixed tables/views only


SQL> alter database open;

Database altered.

SQL> select segment_name,status from dba_rollback_segs where status='NEEDS RECOVERY';

SEGMENT_NAME                   STATUS
------------------------------ ----------------
_SYSSMU70$                     NEEDS RECOVERY

SQL> create pfile='/tmp/pfile' from spfile;

File created.

关闭数据库,在pfile中增加
*._corrupted_rollback_segments=(_SYSSMU70$)

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 8589934592 bytes
Fixed Size                  2034520 bytes
Variable Size            1124074664 bytes
Database Buffers         7398752256 bytes
Redo Buffers               65073152 bytes
Database mounted.
Database opened.
SQL> drop  rollback segment "_SYSSMU70$";

Rollback segment dropped.

SQL> alter database datafile 3 online;
alter database datafile 3 online
*
ERROR at line 1:
ORA-01113: file 3 needs media recovery
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'


SQL> recover  datafile 3 ;
Media recovery complete.
SQL> alter database datafile 3 online;

Database altered.

三、解决坏块问题
1、alert日志中坏块记录

Sat Dec 17 20:33:31 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 20:33:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28027.trc:
Sat Dec 17 20:43:32 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'

2、查询坏块对象

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
 WHERE FILE_ID = &FILE_ID
   AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;  3    4  
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 4571
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 4571 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
                                           
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SMON_SCN_TIME_SCN_IDX
INDEX              SYSTEM

SQL> alter index SMON_SCN_TIME_SCN_IDX rebulid online;
alter index SMON_SCN_TIME_SCN_IDX rebulid online
                                  *
ERROR at line 1:
ORA-02243: invalid ALTER INDEX or ALTER MATERIALIZED VIEW option

SQL> select table_name from dba_indexes where index_name='SMON_SCN_TIME_SCN_IDX';

TABLE_NAME
------------------------------
SMON_SCN_TIME

3、解决坏块问题

SQL> truncate table SMON_SCN_TIME;
truncate table SMON_SCN_TIME
               *
ERROR at line 1:
ORA-03292: Table to be truncated is part of a cluster

SQL> truncate  cluster SMON_SCN_TIME;
truncate  cluster SMON_SCN_TIME
                  *
ERROR at line 1:
ORA-00943: cluster does not exist

SQL> SELECT dbms_metadata.get_ddl('TABLE','SMON_SCN_TIME','SYS') FROM dual;
 
DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
--------------------------------------------------------------------------------
 
  CREATE TABLE "SYS"."SMON_SCN_TIME"
   (    "THREAD" NUMBER,
        "TIME_MP" NUMBER,
        "TIME_DP" DATE,
        "SCN_WRP" NUMBER,
        "SCN_BAS" NUMBER,
        "NUM_MAPPINGS" NUMBER,
        "TIM_SCN_MAP" RAW(1200),
        "SCN" NUMBER DEFAULT 0,
        "ORIG_THREAD" NUMBER DEFAULT 0           /* for downgrade */
   ) CLUSTER "SYS"."SMON_SCN_TO_TIME" ("THREAD")

SQL> truncate cluster smon_scn_to_time;

Cluster truncated.

SQL> alter system flush buffer_cache;

System altered.

四、解决AUTO_SPACE_ADVISOR_JOB引起bug

Sat Dec 17 21:00:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:41 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:44 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:47 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:50 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:57 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:00 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:03 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []

查看MOS发现(430223.1,785899.1)Segment advisor带来的buffer坏块,可以禁用AUTO_SPACE_ADVISOR_JOB并清空buffer来解决,最终解决办法,升级数据库

SQL> exec dbms_scheduler.disable('AUTO_SPACE_ADVISOR_JOB'); 

PL/SQL procedure successfully completed.

SQL> alter system flush buffer_cache;

System altered.

至此这次数据库sysaux数据文件异常恢复完全结束。再次提醒各位,数据库一定要做好备份和归档工作。

此条目发表在 Oracle备份恢复 分类目录。将固定链接加入收藏夹。

sysaux数据文件异常恢复》有 2 条评论

  1. 惜分飞 说:

    11.2.0.3中的SMON_SCN_TIME表的cluster表为
    truncate 命令改为
    truncate CLUSTER “SYS”.”SMON_SCN_TO_TIME_AUX”;

    DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
    --------------------------------------------------------------------------------
    
      CREATE TABLE "SYS"."SMON_SCN_TIME"
       (    "THREAD" NUMBER,
            "TIME_MP" NUMBER,
            "TIME_DP" DATE,
            "SCN_WRP" NUMBER,
            "SCN_BAS" NUMBER,
            "NUM_MAPPINGS" NUMBER,
            "TIM_SCN_MAP" RAW(1200),
            "SCN" NUMBER DEFAULT 0,
            "ORIG_THREAD" NUMBER DEFAULT 0           /* for downgrade */
    
    DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
    --------------------------------------------------------------------------------
       ) CLUSTER "SYS"."SMON_SCN_TO_TIME_AUX" ("THREAD")
    

      [引用]  [回复]

  2. 惜分飞 说:

    SMON_SCN_TIME是Oracle数据库的系统表,用以进行辅助恢复等功能。
    SMON_SCN_TIME由smon来维护每5分钟被更新一次,根据数据库的版本不同保留的条数不同(9i中1440条,10g中最多14400条),当这个表中数据出现异常时,直接truncate掉该表对应的cluster,如果是index出现异常可以删除index,然后重建

    drop index smon_scn_time_scn_idx;
    drop index smon_scn_time_tim_idx;
    create unique index smon_scn_time_scn_idx on smon_scn_time(scn);
    create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp);
    analyze table smon_scn_time validate structure cascade;
    

      [引用]  [回复]

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>