hp平台rdisk中磁盘丢失导致asm启动报ORA-15042恢复

有老朋友找到我,说一个客户的数据库异常,问题是asm无法正常mount,提示是缺少两块磁盘.问我是否可以恢复.因为是内网环境,通过他那边发过来的零零散散的信息,大概分析如下
asm alert日志报错
ERROR: diskgroup DGROUP1 was not mounted

Fri Aug 12 16:03:12 EAT 2016
SQL> alter diskgroup DGROUP1 mount 
Fri Aug 12 16:03:12 EAT 2016
NOTE: cache registered group DGROUP1 number=1 incarn=0xf6781b5c
Fri Aug 12 16:03:12 EAT 2016
NOTE: Hbeat: instance first (grp 1)
Fri Aug 12 16:03:16 EAT 2016
NOTE: start heartbeating (grp 1)
Fri Aug 12 16:03:16 EAT 2016
NOTE: cache dismounting group 1/0xF6781B5C (DGROUP1) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGROUP1 was not mounted

前台尝试mount asm 磁盘组报错ORA-15042
ORA-15042


从这里可以明显的看出来asm 磁盘组无法正常mount,是由于缺少asm disk 15,16.如果想恢复asm,最好的方法就是找出来这两个磁盘.通过kfed对现在的磁盘进行分析,最后我们发现asm disk 14对应的磁盘为disk160,,asm disk 17对应的disk163,根据第一感觉很可能是disk161和disk161两块盘异常,让机房检查硬件无任何告警

OS层面分析
省略和本次结论无关的记录

ls -l /dev/rdisk
crw-rw----   1 oracle     dba         13 0x000070 Jan  1  2016 disk160
crw-rw----   1 oracle     dba         13 0x000073 Jan  1  2016 disk163

ls -l /dev/disk
brw-r-----   1 bin        sys          1 0x000070 Jan 13  2015 disk160
brw-r-----   1 bin        sys          1 0x000071 Jan 13  2015 disk161
brw-r-----   1 bin        sys          1 0x000072 Jan 13  2015 disk162
brw-r-----   1 bin        sys          1 0x000073 Jan 13  2015 disk163

这里我们发现在hp unix中/dev/disk下面磁盘都存在,但是/dev/rdisk下面丢失,通过ioscan相关命令继续分析

ioscan -fNnkC disk
disk    160  64000/0xfa00/0x70  esdisk   CLAIMED     DEVICE       HP      OPEN-V
                      /dev/disk/disk160   /dev/rdisk/disk160
disk    161  64000/0xfa00/0x71  esdisk   CLAIMED     DEVICE       HP      OPEN-V
                      /dev/disk/disk161
disk    162  64000/0xfa00/0x72  esdisk   CLAIMED     DEVICE       HP      OPEN-V
                      /dev/disk/disk162
disk    163  64000/0xfa00/0x73  esdisk   CLAIMED     DEVICE       HP      OPEN-V
                      /dev/disk/disk163   /dev/rdisk/disk163

这里我们基本上可以确定是/dev/rdisk下面的盘发生丢失.进一步分析,因为rdisk是聚合后的盘符,那我们分析聚合前的盘符是否正常

ioscan -m dsf
/dev/rdisk/disk160       /dev/rdsk/c29t12d4
                         /dev/rdsk/c28t12d4
/dev/rdisk/disk163       /dev/rdsk/c29t12d7
                         /dev/rdsk/c28t12d7

ls -l /dev/rdsk
crw-r-----   1 bin        sys        188 0x1dc000 Apr 22  2014 c29t12d0
crw-r-----   1 bin        sys        188 0x1dc100 Apr 22  2014 c29t12d1
crw-r-----   1 bin        sys        188 0x1dc300 Jan 13  2015 c29t12d3
crw-r-----   1 bin        sys        188 0x1dc400 Jan 13  2015 c29t12d4
crw-r-----   1 bin        sys        188 0x1dc500 Jan 13  2015 c29t12d5
crw-r-----   1 bin        sys        188 0x1dc600 Jan 13  2015 c29t12d6
crw-r-----   1 bin        sys        188 0x1dc700 Jan 13  2015 c29t12d7

crw-r-----   1 bin        sys        188 0x1cc100 Apr 22  2014 c28t12d1
crw-r-----   1 bin        sys        188 0x1cc300 Jan 13  2015 c28t12d3
crw-r-----   1 bin        sys        188 0x1cc400 Jan 13  2015 c28t12d4
crw-r-----   1 bin        sys        188 0x1cc500 Jan 13  2015 c28t12d5
crw-r-----   1 bin        sys        188 0x1cc600 Jan 13  2015 c28t12d6
crw-r-----   1 bin        sys        188 0x1cc700 Jan 13  2015 c28t12d7

通过这里我们基本上可以大概判断出来/dev/rdsk/c28t12d5,/dev/rdsk/c28t12d6,/dev/rdsk/c29t12d5,/dev/rdsk/c29t12d6就是我们需要找的/dev/rdisk/disk161和disk162的聚合之前的盘符.也就是说,现在我们判断只有/dev/rdisk下面的字符设备有问题,其他均正常.

通过系统命令修复异常

insf -e -H 64000/0xfa00/0x71
insf -e -H 64000/0xfa00/0x72

hp-asm-disk


现在已经可以正常看到/dev/rdisk/disk161和/dev/rdisk/disk162盘符,初步判断,os层面盘符已经恢复正常.修改磁盘权限和所属组

chmod 660 /dev/rdisk/disk161
chmod 660 /dev/rdisk/disk162
chown oracle:dba /dev/rdisk/disk161
chown oracle:dba /dev/rdisk/disk162

正常启动asm,mount磁盘组,open数据库
asm-mount


这次的恢复,主要是从操作系统层面判断解决问题,从而实现数据库完美恢复,数据0丢失.有类似恢复案例:分区无法识别导致asm diskgroup无法mount

发表在 Oracle ASM, Unix | 标签为 , , | 留下评论

ORA-600 kcbz_check_objd_typ_1 处理

客户数据库异常(ORA-600 kcbz_check_objd_typ_1),让我们远程给分析处理
ORA-600 kcbz_check_objd_typ_1异常

Mon Aug  8 12:19:28 2016
Completed: ALTER DATABASE OPEN
Mon Aug  8 12:19:29 2016
db_recovery_file_dest_size of 20480 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Aug  8 12:19:33 2016
Errors in file /home/oracle/admin/RT/bdump/rt_smon_1514.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Mon Aug  8 12:20:21 2016
Shutting down archive processes
Mon Aug  8 12:20:26 2016
ARCH shutting down
ARC3: Archival stopped
Mon Aug  8 13:12:25 2016
Thread 1 advanced to log sequence 13804
  Current log# 3 seq# 13804 mem# 0: /home/oracle/product/10.2.0/oradata/RT/redo03a.log
Mon Aug  8 14:01:37 2016
Thread 1 advanced to log sequence 13805
  Current log# 2 seq# 13805 mem# 0: /home/oracle/product/10.2.0/oradata/RT/redo02a.log
Mon Aug  8 14:20:51 2016
Errors in file /home/oracle/admin/RT/bdump/rt_smon_1514.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Mon Aug  8 15:54:47 2016
Thread 1 advanced to log sequence 13808
  Current log# 2 seq# 13808 mem# 0: /home/oracle/product/10.2.0/oradata/RT/redo02a.log
Mon Aug  8 16:21:48 2016
Errors in file /home/oracle/admin/RT/bdump/rt_smon_1514.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Mon Aug  8 16:22:05 2016
Errors in file /home/oracle/admin/RT/bdump/rt_pmon_1500.trc:
ORA-00474: SMON process terminated with error

这里比较明显,数据库报大量ORA-600 kcbz_check_objd_typ_1错误之后,然后smon进程终止,实例crash.

smon trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /home/oracle/product/10.2.0/db_1
System name:	SunOS
Node name:	st104
Release:	5.10
Version:	Generic_141445-09
Machine:	i86pc
Instance name: RT
Redo thread mounted by this instance: 1
Oracle process number: 12
Unix process pid: 1514, image: oracle@st104 (SMON)

*** 2016-08-08 12:19:26.868
*** SERVICE NAME:() 2016-08-08 12:19:26.868
*** SESSION ID:(383.1) 2016-08-08 12:19:26.868
Dead transaction 0x003d.002.0000f964 recovered by SMON
Dead transaction 0x0041.017.00004d55 recovered by SMON
Dead transaction 0x0047.002.0000180c recovered by SMON
Dead transaction 0x004c.01c.00001b57 recovered by SMON
*** SESSION ID:(383.1) 2016-08-08 12:19:27.470
DATA seg.obj=0, on-disk obj=925949, dsflg=0, dsobj=923715, cls=4
Formatted dump of block:
buffer tsn: 4 rdba: 0x0100336b (4/13163)
scn: 0x09c6.b2c7f7a2 seq: 0x02 flg: 0x04 tail: 0xf7a20602
frmt: 0x02 chkval: 0x649b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

*** SESSION ID:(383.1) 2016-08-08 12:19:34.244
DATA seg.obj=0, on-disk obj=925950, dsflg=0, dsobj=923671, cls=4
Formatted dump of block:
buffer tsn: 4 rdba: 0x01003343 (4/13123)
scn: 0x09c6.b2c7f7dc seq: 0x02 flg: 0x04 tail: 0xf7dc0602
frmt: 0x02 chkval: 0x8013 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

*** SESSION ID:(383.1) 2016-08-08 12:19:35.197
DATA seg.obj=0, on-disk obj=925941, dsflg=0, dsobj=923657, cls=4
Formatted dump of block:
buffer tsn: 7 rdba: 0x01c03d53 (7/15699)
scn: 0x09c6.b2c7f570 seq: 0x02 flg: 0x04 tail: 0xf5700602
frmt: 0x02 chkval: 0xe5c5 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
*** SESSION ID:(383.1) 2016-08-08 12:19:38.965
DATA seg.obj=0, on-disk obj=925948, dsflg=0, dsobj=923656, cls=4
Formatted dump of block:
buffer tsn: 7 rdba: 0x01c03a6b (7/14955)
scn: 0x09c6.b2c7f745 seq: 0x02 flg: 0x04 tail: 0xf7450602
frmt: 0x02 chkval: 0x58c5 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

这里可以看出来有block中的obj和dataobj不匹配.

查询seg$.type=3
type=3为临时对象,由于异常原因导致smon在清理temp对象无法正常完成,从而使得smon终止,实例crash.

SQL> select file#, block#, ts# from seg$ where type# = 3;

     FILE#     BLOCK#        TS#
---------- ---------- ----------
         4      13163          4
         4      13123          4
         7      15699          7
         7      14955          7

ORA-600 kcbz_check_objd_typ_1处理方法

1) Check tablespace bitmap


SQL> oradebug setmypid 
SQL> exec dbms_space_admin.tablespace_verify('&TBSP_NAME') 
SQL> oradebug tracefile_name 

or if the tablespace involved is an ASSM tablespace: 

SQL> oradebug setmypid 
SQL> exec dbms_space_admin.assm_tablespace_verify ('&TBSP_NAME',dbms_space_admin.TS_VERIFY_BITMAPS) 
SQL> oradebug tracefile_name

I am expecting to fail 

2) Corrupt these temp segments 

SQL> exec dbms_space_admin.segment_corrupt('&TBSP_NAME', &FILE#, &BLOCK#) 

3) Drop them 

SQL> exec dbms_space_admin.segment_drop_corrupt('&TBSP_NAME', &FILE#, &BLOCK#) 

4) Rebuild tablespace bitmap


exec DBMS_SPACE_ADMIN.TABLESPACE_REBUILD_BITMAPS('&TBSP_NAME')


5) Verify the tablespace again 

SQL> oradebug setmypid 
SQL> exec dbms_space_admin.tablespace_verify('&TBSP_NAME') 
SQL> oradebug tracefile_name 

or if the tablespace involved is an ASSM tablespace: 

SQL> oradebug setmypid 
SQL> exec dbms_space_admin.assm_tablespace_verify('&TBSP_NAME',dbms_space_admin.TS_VERIFY_BITMAPS) 
SQL> oradebug tracefile_name 
发表在 ORA-xxxxx | 标签为 , | 留下评论

linux 7(redhat,oracle linux,centos)中使用udev

慢慢的linux 7的使用人越来越多了,但是linux 7相对于5和6的版本,变动确实比较大,本文主要描写在linux 7中如何实现udev,实现设备持久化,权限和所属组的修改
linux版本

Oracle Linux Server release 7.1
[root@www.xifenfei.com ~]# uname -a
Linux www.xifenfei.com 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 04:05:24 PST 2015 x86_64 x86_64 x86_64 GNU/Linux

VMware Workstation中显示uuid需要在vmx文件中增加

disk.enableUUID = "TRUE"

查看磁盘分区

[root@www.xifenfei.com ~]# fdisk -l

Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xf60fe217

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048     2099199     1048576   83  Linux

Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000bce7c

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048     4204543     2101248   8e  Linux LVM
/dev/sda2   *     4204544    79702015    37748736   83  Linux

Disk /dev/sdc: 32.2 GB, 32212254720 bytes, 62914560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/ol-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

查看磁盘uuid

[root@www.xifenfei.com ~]# /usr/lib/udev/scsi_id -g -u /dev/sdb1
36000c29e91831cedbe69afe6cc08daf7
[root@www.xifenfei.com ~]# /usr/lib/udev/scsi_id -g -u /dev/sdc
36000c292495e9d9de6f21640cc7b53b9

udev绑定

[root@www.xifenfei.com ~]# more /etc/udev/rules.d/99-my-asmdevices.rules 
KERNEL=="sd*[!0-9]", ENV{DEVTYPE}=="disk", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode",
 RESULT=="36000c292495e9d9de6f21640cc7b53b9", RUN+="/bin/sh -c 'mknod /dev/xifenfei-sdc b $major $minor; 
chown oracle:dba /dev/xifenfei-sdc; chmod 0660 /dev/xifenfei-sdc'"

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$parent", 
RESULT=="36000c29e91831cedbe69afe6cc08daf7", SYMLINK+="xifenfei-sdb1", OWNER="oracle", GROUP="dba", MODE="0660"

绑定结果

[root@www.xifenfei.com ~]# ls -l /dev/xifenfei-*
lrwxrwxrwx. 1 root   root     4 Aug  7 22:49 /dev/xifenfei-sdb1 -> sdb1
brw-rw----. 1 oracle dba  8, 32 Aug  7 22:36 /dev/xifenfei-sdc
[root@www.xifenfei.com ~]# ls -l /dev/sdb1
brw-rw----. 1 oracle dba 8, 17 Aug  7 22:49 /dev/sdb1

udev只修改磁盘权限

[root@www.xifenfei.com ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type:
   p   primary (1 primary, 0 extended, 3 free)
   e   extended
Select (default p): p
Partition number (2-4, default 2): 
First sector (2099200-41943039, default 2099200): 
Using default value 2099200
Last sector, +sectors or +size{K,M,G} (2099200-41943039, default 41943039): +1G
Partition 2 of type Linux and of size 1 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@www.xifenfei.com ~]# more /etc/udev/rules.d/99-my-asmdevices.rules 
KERNEL=="sd?2", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$parent",
 RESULT=="36000c29e91831cedbe69afe6cc08daf7",  OWNER="oracle", GROUP="dba", MODE="0660"
[root@www.xifenfei.com ~]# /sbin/udevadm trigger --type=devices --action=change
[root@www.xifenfei.com ~]# ls -l /dev/sdb2
brw-rw----. 1 oracle dba 8, 18 Aug  7 23:14 /dev/sdb2

这里可以发现在linux 7中使用了两种方法绑定udev,一种是真实生成udev设备,另外一种是通过软连接实现.感谢lunar(Lunar的oracle实验室)在linux 7学习中的帮助

发表在 Linux | 标签为 , | 留下评论

MON_MODS$表ORA-600 13013报错处理

有朋友反馈数据库启动运行一点时间之后,然后就自动crash,让我们帮忙找原因,通过分析是由于smon进程触发ORA-600 13013导致数据库异常
alert日志报错信息

Thu Aug  4 18:39:44 2016
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=33, OS id=22935
Thu Aug  4 18:39:44 2016
Completed: ALTER DATABASE OPEN
Thu Aug  4 18:39:44 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 18:48:41 2016
Thread 1 advanced to log sequence 86746
  Current log# 3 seq# 86746 mem# 0: /opt/ora10/oradata/ora10g/redo03.log
Thu Aug  4 18:58:13 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:58:56 2016
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Thu Aug  4 18:59:06 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:59:08 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_pmon_22413.trc:
ORA-00474: SMON process terminated with error
Thu Aug  4 18:59:08 2016
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 22413

通过trace文件大概可以发现是由于ORA-600 13013错误导致数据库crash,而且这里有类似”SMON was doing flushing of monitored table stats”错误提示,根据经验,很可能是smon把表的dml操作收集信息相关.

ORA-600 [13013] 含义

ORA-600 [13013] [a] [b] {c} [d] [e] [f] 


This format relates to Oracle Server 8.0.3 to 10.1 

Arg [a] Passcount 
Arg [b] Data Object number 
Arg {c} Tablespace Relative DBA of block containing the row to be updated 
Arg [d] Row Slot number 
Arg [e] Relative DBA of block being updated (should be same as 1) 
Arg [f] Code 

根据这个错误信息,以及How to resolve ORA-00600 [13013], [5001] [ID 816784.1]中的描述

ORA-600 13013 对应对象

SQL> select object_name from dba_objects where object_id=482

OBJECT_NAME
--------------------------------------------------------------------------------
MON_MODS$

该对象正是和监控dml变化相关的表,smon会对其进行相关操作,以前写过一篇:MON_MODS$和MON_MODS_ALL$统计DML操作次数的文章
对于MON_MODS$表ORA-600 13013处理

SQL> analyze table mon_mods$ validate structure cascade;

analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

 
SQL> select index_name from dba_indexes where table_name='MON_MODS$';

INDEX_NAME
------------------------------
I_MON_MODS$_OBJ

SQL> ALTER INDEX I_MON_MODS$_OBJ REBUILD;

Index altered.

SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

SQL> CREATE TABLE MON_MODS_BAK AS SELECT * FROM MON_MODS$;

Table created.

SQL> SELECT COUNT(*) FROM MON_MODS$;

  COUNT(*)
----------
      1247

SQL> C/MON_MODS$/MON_MODS_BAK;
  1* SELECT COUNT(*) FROM MON_MODS_BAK
SQL> /

  COUNT(*)
----------
      1247

SQL> TRUNCATE TABLE MON_MODS$;

Table truncated.

SQL> INSERT INTO MON_MODS$ SELECT * fROM MON_MODS_BAK;

1247 rows created.

SQL> COMMIT;

Commit complete.

SQL>  analyze table mon_mods$ validate structure cascade;

Table analyzed.

自此关于MON_MODS$表相关的ORA-600 13013异常处理完全,当然也可以通过重建I_MON_MODS$_OBJ索引来解决,但是不能通过rebuild index解决.数据库也就不会因此而crash了.

发表在 Oracle备份恢复 | 留下评论

ORA-600 4042 故障恢复

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check) 检查结果
wrong_scn
通过上图可以知道file 2未能正常恢复(需要看日志分析原因),file 3以前就被offline,需要历史归档(非归档状态,所以这个先放着,后续再处理)

分析file 2 不成功原因

Wed Aug  3 15:21:11 2016
ALTER DATABASE RECOVER  datafile 2  
Wed Aug  3 15:21:11 2016
Media Recovery Start
 parallel recovery started with 2 processes
Wed Aug  3 15:21:11 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Wed Aug  3 15:21:11 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:13 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:18 2016
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

通过日志可以知道由于ORA-600 3020导致file 2不能正常的恢复.
处理file 2

SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.
Thu Aug  4 01:58:35 2016
ALTER DATABASE RECOVER  datafile 2 allow 1 corruption  
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Thu Aug  4 01:58:35 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:58:35 2016
Media Recovery Complete (oracle)
Completed: ALTER DATABASE RECOVER  datafile 2 allow 1 corruption  

尝试open数据库

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
Thu Aug  4 01:59:20 2016
alter database open 
Thu Aug  4 01:59:21 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Thu Aug  4 01:59:21 2016
Started redo scan
Thu Aug  4 01:59:21 2016
Completed redo scan
 1619 redo blocks read, 0 data blocks need recovery
Thu Aug  4 01:59:21 2016
Started redo application at
 Thread 1: logseq 1916, block 12724
Thu Aug  4 01:59:21 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:59:21 2016
Completed redo application
Thu Aug  4 01:59:21 2016
Completed crash recovery at
 Thread 1: logseq 1916, block 14343, scn 3303614971196
 0 data blocks read, 0 data blocks written, 1619 redo blocks read
Thu Aug  4 01:59:21 2016
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=5542
Thu Aug  4 01:59:21 2016
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=19, OS id=5544
Thu Aug  4 01:59:21 2016
Thread 1 advanced to log sequence 1917
Thread 1 opened at log sequence 1917
  Current log# 2 seq# 1917 mem# 0: /home/oracle/orabase/oradata/ORACLE/redo02.log
Successful open of redo thread 1
Thu Aug  4 01:59:21 2016
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug  4 01:59:21 2016
ARC1: STARTING ARCH PROCESSES
Thu Aug  4 01:59:21 2016
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Aug  4 01:59:21 2016
SMON: enabling cache recovery
Thu Aug  4 01:59:21 2016
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=20, OS id=5546
Thu Aug  4 01:59:21 2016
db_recovery_file_dest_size of 2048 MB is 1.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 01:59:22 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 5505
ORA-1092 signalled during: alter database open ...

由于ORA-600 4042错误导致数据库无法正常open.
分析ORA-600 4042

PARSING IN CURSOR #4 len=142 dep=1 uid=0 oct=3 lid=0 tim=1435788503594313 hv=361892850 ad='a7ab2db8'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #4:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594311
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aae75802218  bln=22  avl=02  flg=05
  value=3
EXEC #4:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594393
FETCH #4:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,tim=1435788503594412
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=8 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=3 us)'
WAIT #1: nam='db file sequential read' ela= 10 file#=2 block#=41 blocks=1 obj#=-1 tim=1435788503594468
Dump of buffer cache at level 4 for tsn=1, rdba=8388649
BH (0x95ff3c58) file#: 2 rdba: 0x00800029 (2/41) class: 21 ba: 0x95ef0000
  set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [a8b77880,a8b77880] lru: [95ff3dd0,a8e70338]
  ckptq: [NULL] fileq: [NULL] objq: [a43da110,a43da110]
  use: [a8e6e658,a8e6e658] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800029 (2/41)
  scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
  frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
Dump of memory from 0x0000000095EF0000 to 0x0000000095EF2000
095EF0000 0000A200 00800029 00000000 01010000  [....)...........]
095EF0010 00000000 00000000 00000000 00000000  [................]
        Repeat 509 times
095EF1FF0 00000000 00000000 00000000 00000001  [................]
Dump of memory from 0x0000000095EF0014 to 0x0000000095EF1FFC
095EF0010          00000000 00000000 00000000      [............]
095EF0020 00000000 00000000 00000000 00000000  [................]

这里可以发现,file 2 block 41的type为unknown,注意观察ORA-600 3020的错误,我们发现当时报的坏块也正好是该block.基本上可以确定由于前面的allow 1 corruption操作导致了后面的ORA-600 4042的错误.官方关于ORA-600[4042]解释
ORA-600-4042


通过修改undo$中的回滚段状态(参考:bbed修改undo$(回滚段)状态)
正常open数据库,修改file 3的scn并online数据文件

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1224736768 bytes
Fixed Size                  2020384 bytes
Variable Size             318770144 bytes
Database Buffers          889192448 bytes
Redo Buffers               14753792 bytes
Database mounted.
SQL>   SELECT thread#,              
  2           a.sequence#,
  3           a.group#,
  4           TO_CHAR (first_change#, '9999999999999999') "SCN",
  5           a.status,
  6           MEMBER
  7      FROM v$log a, v$logfile b
  8     WHERE a.group# = B.GROUP#
  9  ORDER BY a.sequence# DESC;

   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1919          1     3303615011212
CURRENT
/home/oracle/orabase/oradata/ORACLE/redo01.log

         1       1918          3     3303614991206
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo03.log

   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------

         1       1917          2     3303614971197
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo02.log


SQL> recover database using backup controlfile;
ORA-00279: change 3303615011452 generated at 08/04/2016 02:06:52 needed for
thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORACLE/archivelog/2016_08_04/o1_mf_1_1919_%u
_.arc
ORA-00280: change 3303615011452 for thread 1 is in sequence #1919


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/orabase/oradata/ORACLE/redo01.log
Log applied.
Media recovery complete.
SQL> alter database datafile 3 online;

Database altered.

SQL> alter database open resetlogs;

Database altered.

SQL> 

至此该数据库基本上恢复完成,强烈建议使用逻辑方式导出导入重建库.

发表在 Oracle备份恢复 | 标签为 , | 留下评论