磁盘空间不足迁移数据文件导致故障恢复

有客户由于磁盘空间不足,在线把oracle数据迁移到其他位置

Tue Jun 01 11:44:32 2021
Thread 1 advanced to log sequence 28754 (LGWR switch)
  Current log# 2 seq# 28754 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Tue Jun 01 11:59:54 2021
Non critical error ORA-48113 caught while writing to trace file
      "/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_mmon_23341.trc"
Error message: 
Writing to the above trace file is disabled for now on...
Tue Jun 01 12:00:00 2021
Non critical error ORA-48181 caught while writing to trace file
       "/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29692.trc"
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29692.trc:
ORA-12012: error on auto execute of job "XIFENFEI"."STATISTICS_1_JOBS"
ORA-06575: Package or function PKG_STAT_1_2018 is in an invalid state
Tue Jun 01 12:12:26 2021

迁移走数据文件之后,数据库报错,并且强制关闭数据库

ORA-01116: error in opening database file 30
ORA-01110: data file 30: '/u02/orcdate/AAAA.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m001_29106.trc:
ORA-01116: error in opening database file 31
ORA-01110: data file 31: '/u02/orcdate/CBD.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Mon Jun 07 10:25:03 2021
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_9817.trc:
ORA-01116: error in opening database file 24
ORA-01110: data file 24: '/u02/orcdate/ABC.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Mon Jun 07 10:25:10 2021
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Read of datafile '/u02/orcdate/XXXXXXX.dbf' (fno 21) header failed with ORA-01208
Rereading datafile 21 header failed with ORA-01208
Mon Jun 07 10:25:36 2021
Adjusting the default value of parameter parallel_max_servers
from 640 to 485 due to the value of parameter processes (500)
Starting ORACLE instance (normal)
Mon Jun 07 10:28:20 2021
Shutting down instance (abort)
License high water mark = 152
USER (ospid: 7987): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Mon Jun 07 10:28:30 2021
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 7987
Mon Jun 07 10:28:31 2021
Instance shutdown complete

然后又把文件迁移回来,并且进行了一系列数据库恢复,最后我们接手是情况是有多个文件被offline,并且有一个文件报WRONG FILE NUMBER,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检查,对其中的v$datafile,v$datafile_header,v$tablespace综合分析
20210612154127
20210612154301
20210612154350


确认是WXD_YPT表空间数据文件直接拷贝为WXD表空间数据文件,经过客户确认,WXD数据不重要,客户先忽略.
通过一系列处理,尝试open数据库,报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [3786], [2612118101], [3786], [2612128448], [12583040]
ORA-00600: internal error code, arguments: [2662], [3786], [2612118100], [3786], [2612128448], [12583040]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [3786], [2612118098], [3786], [2612128448], [12583040]
Process ID: 14888
Session ID: 198 Serial number: 3

修改数据库scn(参考blog相关link:ORA-600 2662)数据库顺利open,并且协助客户导出数据并导入新库,完成数据库恢复.
这次运气比较好,只是丢失了一点数据,没有引起重大事故.再此提醒:不太了解oracle的朋友,操作数据库需谨慎,不要在线直接移动数据文件,另外为了更好的恢复效果,更快的恢复,故障之后,最好尽可能的告知所有操作.

发表在 Oracle备份恢复 | 标签为 | 留下评论

文件系统重新分区oracle恢复

最近处理的一个恢复,算是这几年中的一个奇葩.
1. oracle dg 主备库raid同时损坏,找硬件恢复厂商软件重组raid,恢复厂商判断所有磁盘全部都是好的
2. 主库系统被重装,文件系统重新分区.备库在使用duplicate搭建dg的过程中(通过alert日志分析以前的dg是正常的,直接rm掉了所有文件,然后使用duplicate搭建),只是部分文件拷贝到了备库
3. 备份放在一台单独的存储上,但是当上去看是发现存储上面空空的,没有任何数据(通过对ctl的分析,确认存储上面只有一个月之前的备份记录,估计也被删除或者重新分区了(通过后续分析,判断应该是被重新分区了)
客户没有和我们说任何信息,就是说突然两个raid都损坏了,找硬件厂商进行恢复,硬件厂商开始也觉得这个会比较简单,直接通过raid模拟恢复出来lun,然后通过软件恢复出来一些数据文件(反馈给我的信息是少了redo,需要我们协助恢复),通过深入分析,发现少了大量数据文件,基于现在的恢复基本上没意义.然后通过低主库的raid模拟恢复,拷贝出来数据文件,结果发现恢复出来的文件大小,和文件头记录不匹配
20210607232818


这里显示文件大小应该是30G,但是实际拷贝的文件只有26G大小
20210607232731

通过底层进一步分析,发现任何大于4G的文件,按照4G为单位间隔损坏(4G好,4G损坏,4G好……)
20210605203719
20210605201235

出现这类情况,通过底层分析,判断是客户对磁盘进行了重新分区,引起底层问题导致
20210607214629

基于这样的情况,没有太多好的方法处理,直接使用底层碎片技术进行恢复类似oracle 碎片层面恢复,我们进行了挺多的,类似:
dbca删除库和rm删库恢复
文件系统损坏导致数据文件异常恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
alter database create datafile 导致数据文件丢失恢复
rm -rf 删除数据文件恢复方法—文件系统反删除+oracle碎片重组
运气不错,顺利open数据库
20210607234450

本次恢复走了很多弯路,主要是不清楚客户那边处于什么原因,多次隐秘故障原因,没有如实的告知我们故障情况,一步步尝试,走了很多弯路,耽误了不少时间.如果可能请尽量告诉我们准确情况,便于我们准确做出判断,快速高效的恢复.

发表在 Oracle, 非常规恢复 | 标签为 , , | 留下评论

ext4 lvm在线扩容

文件系统格式

[root@xifenfei~]# uname -a
Linux datacenter 4.1.12-61.1.28.el6uek.x86_64 #2 SMP Thu Feb 23 20:03:53 PST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@xifenfei~]# mount
/dev/mapper/vg_datacenter-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_datacenter-lv_home on /home type ext4 (rw)
/dev/mapper/vg_datacenter-lvu01 on /u01 type ext4 (rw)
/dev/sdb1 on /oracle_data type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[root@datacenter ~]# 

linux扫描新磁盘

[root@xifenfei ~]#  ls /sys/class/scsi_host/
host0  host1  host2
[root@xifenfei ~]# echo '- - -'  > /sys/class/scsi_host/host0/scan
[root@xifenfei ~]# echo '- - -'  > /sys/class/scsi_host/host1/scan
[root@xifenfei ~]# echo '- - -'  > /sys/class/scsi_host/host2/scan

vg扩容

[root@xifenfei ~]# pvcreate /dev/sdc1
  Physical volume "/dev/sdc1" successfully created
[root@xifenfei ~]# vgs
  VG            #PV #LV #SN Attr   VSize   VFree  
  vg_xifenfei   1   4   0 wz--n- 499.51g 584.00m
[root@xifenfei ~]# vgextend vg_xifenfei /dev/sdc1
  Volume group "vg_xifenfei" successfully extended
[root@xifenfei ~]# vgs
  VG            #PV #LV #SN Attr   VSize   VFree  
  vg_xifenfei   2   4   0 wz--n- 999.50g 500.56g

lv进行扩容

[root@xifenfei ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_xifenfei-lv_root
                       50G  6.4G   41G  14% /
tmpfs                  63G     0   63G   0% /dev/shm
/dev/sda1             477M   84M  364M  19% /boot
/dev/mapper/vg_xifenfei-lv_home
                      1.9G   29M  1.8G   2% /home
/dev/mapper/vg_xifenfei-lvu01
                      436G  335G   80G  81% /u01
/dev/sdb1             985G  462G  473G  50% /oracle_data
[root@xifenfei ~]# lvresize -L +500G /dev/mapper/vg_xifenfei-lvu01
  Size of logical volume vg_xifenfei/lvu01 changed from 443.00 GiB (113408 extents) to 943.00 GiB (241408 extents).
  Logical volume lvu01 successfully resized.
[root@xifenfei ~]# resize2fs /dev/mapper/vg_xifenfei-lvu01
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/mapper/vg_xifenfei-lvu01 is mounted on /u01; on-line resizing required
old_desc_blocks = 28, new_desc_blocks = 59
The filesystem on /dev/mapper/vg_xifenfei-lvu01 is now 247201792 blocks long.

[root@xifenfei ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_xifenfei-lv_root
                       50G  6.4G   41G  14% /
tmpfs                  63G     0   63G   0% /dev/shm
/dev/sda1             477M   84M  364M  19% /boot
/dev/mapper/vg_xifenfei-lv_home
                      1.9G   29M  1.8G   2% /home
/dev/mapper/vg_xifenfei-lvu01
                      929G  335G  552G  38% /u01
/dev/sdb1             985G  462G  473G  50% /oracle_data

发表在 Linux | 标签为 | 留下评论