gpk-update-icon进程占用CPU资源100%

在一个客户的机器上发现gpk-update-icon进程长期占用cpu 100%
20230517141726


gpk-update-icon进程在GUI模式下会自动通知rpm软件包更新,是由gnome-packagekit的bug造成的。
gpk-update-icon使用递归主循环,递归循环从dbus回调调用。因此,它处于调度操作的中间,并且在操作完成之前dbus无法进一步调度。
临时解决方法

killall gpk-update-icon

20230517141819


掉掉相关进程之后临时恢复正常,如果要防止后续再发生该问题,可以把系统启动到非图形化界面

[root@HIS_DG ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.8 (Santiago)
[root@HIS_DG ~]# runlevel
N 5
[root@HIS_DG ~]# who -r
         run-level 5  2021-11-13 12:04
[root@HIS_DG ~]# init 3
[root@HIS_DG ~]# who -r
         run-level 3  2023-05-17 14:12                   last=5
[root@HIS_DG ~]# runlevel
5 3

或者卸载相关包

#yum remove gnome-packagekit 或 rpm -e gpk-update-icon
发表在 Linux | 标签为 | 评论关闭

难见的oracle 9i恢复—2023年

时过境迁,以前恢复大量oracle 8/9版本的库,现在一套oracle 9i的库都比较稀奇了.今天恢复客户一套9.2.0.6的aix环境rac库,通过分析确认主要问题:
1. 重建控制文件,resetlogs库遗漏数据文件
missing_dbf


2. 数据库启动主要报错ORA-600 2663和ORA-600 kclchkblk_4

Tue Nov  8 09:10:05 2022
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Tablespace 'TEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #84 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00084' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Tue Nov  8 09:10:05 2022
SMON: enabling tx recovery
Tue Nov  8 09:10:05 2022
Database Characterset is ZHS16GBK
Tue Nov  8 09:10:05 2022
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_536662.trc:
ORA-00600: internal error code, arguments: [2663], [3301], [2638369768], [3301], [2640322622], [], [], []
Tue Nov  8 09:10:06 2022
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_647352.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3301], [18446744072061740072],[3301],[18446744072052954088]
Tue Nov  8 09:10:06 2022
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_536662.trc:
ORA-00600: internal error code, arguments: [2663], [3301], [2638369768], [3301], [2640322622], [], [], []
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 536662
ORA-1092 signalled during: alter database open...

根据客户文件名称的规则,推算出来84号文件实际的文件名(因为使用的是lv[aix的hacmp管理的lv的裸设备方式]),通过dbv确认文件无坏块

DBVERIFY: Release 9.2.0.6.0 - Production on Sat May 13 16:44:09 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/ra_txn_ind12.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 256000
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 299
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 13
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 255688
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 11177081099136 (2602.1576194944)

bbed验证文件该文件是否是84号文件

$ bbed blocksize=8192 filename='/dev/ra_txn_ind12.dbf'   
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Mon May 15 09:45:44 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> map
 File: /dev/ra_txn_ind12.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 608 bytes                    @0       

 ub4 tailchk                                @8188    


BBED> p kcvfh
struct kcvfh, 608 bytes                     @0       
   struct kcvfhbfh, 20 bytes                @0       
      ub1 type_kcbh                         @0        0x0b
      ub1 frmt_kcbh                         @1        0x02
      ub1 spare1_kcbh                       @2        0x00
      ub1 spare2_kcbh                       @3        0x00
      ub4 rdba_kcbh                         @4        0x15000001
      ub4 bas_kcbh                          @8        0x00000000
      ub2 wrp_kcbh                          @12       0x0000
      ub1 seq_kcbh                          @14       0x01
      ub1 flg_kcbh                          @15       0x04 (KCBHFCKV)
      ub2 chkval_kcbh                       @16       0x1b4a
      ub2 spare3_kcbh                       @18       0x0000
   struct kcvfhhdr, 76 bytes                @20      
      ub4 kccfhswv                          @20       0x09200000
      ub4 kccfhcvn                          @24       0x08000000
      ub4 kccfhdbi                          @28       0x05d15ccf
      ……
      ub4 kccfhcsq                          @40       0x00525a20
      ub4 kccfhfsz                          @44       0x0003e800
      s_blkz kccfhbsz                       @48       0x00
      ub2 kccfhfno                          @52       0x0054
      ub2 kccfhtyp                          @54       0x0003
   ……
   ub4 kcvfhrfn                             @528      0x00000054  ---确认是84号文件
  ……

通过bbed修改文件相关信息,然后尝试rename文件,但是recover datafile 84报错

Mon May 15 09:49:44 2023
alter database rename file '/u01/prod/proddb/9.2.0/dbs/MISSING00084' to '/dev/ra_txn_ind12.dbf'
Mon May 15 09:49:44 2023
Completed: alter database rename file '/u01/prod/proddb/9.2.0
Mon May 15 09:51:15 2023
ALTER DATABASE RECOVER  datafile 84  
Media Recovery Start
Mon May 15 09:51:15 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_467190.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []

通过处理之后,数据库recover 正常,但是open报ORA-600 4193错误

Mon May 15 09:57:53 2023
ALTER DATABASE RECOVER  DATABASE  
Media Recovery Start
Mon May 15 09:57:53 2023
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0 errs 0: /dev/rlog01a.dbf
  Mem# 1 errs 0: /dev/rlog01b.dbf
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  DATABASE  
Mon May 15 09:59:24 2023
alter database open
Mon May 15 09:59:24 2023
Beginning crash recovery of 1 threads
Mon May 15 09:59:24 2023
Started redo scan
Mon May 15 09:59:24 2023
Completed redo scan
 75 redo blocks read, 0 data blocks need recovery
Mon May 15 09:59:24 2023
Started recovery at
 Thread 1: logseq 4, block 2, scn 3301.2638369687
Mon May 15 09:59:24 2023
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0 errs 0: /dev/rlog01a.dbf
  Mem# 1 errs 0: /dev/rlog01b.dbf
Mon May 15 09:59:24 2023
Completed redo application
Mon May 15 09:59:24 2023
Ended recovery at
 Thread 1: logseq 4, block 77, scn 3301.2638389765
 0 data blocks read, 0 data blocks written, 75 redo blocks read
Crash recovery completed successfully
Mon May 15 09:59:25 2023
Thread 1 advanced to log sequence 5
Thread 1 opened at log sequence 5
  Current log# 2 seq# 5 mem# 0: /dev/rlog02a.dbf
  Current log# 2 seq# 5 mem# 1: /dev/rlog02b.dbf
Successful open of redo thread 1
Mon May 15 09:59:25 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 15 09:59:25 2023
SMON: enabling cache recovery
Mon May 15 09:59:25 2023
ARC0: Media recovery disabled
Mon May 15 09:59:25 2023
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Tablespace 'TEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Mon May 15 09:59:25 2023
SMON: enabling tx recovery
Mon May 15 09:59:25 2023
Database Characterset is ZHS16GBK
Mon May 15 09:59:25 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413872.trc:
ORA-00600: internal error code, arguments: [4193], [781], [6399], [], [], [], [], []
Mon May 15 09:59:25 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_844004.trc:
ORA-00600: internal error code, arguments: [4193], [56042], [1895], [], [], [], [], []
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 153
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 2893
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
Mon May 15 09:59:26 2023
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Doing block recovery for fno: 12 blk: 3009
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 89
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_844004.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [56042], [1895], [], [], [], [], []
Error 607 happened during db open, shutting down database
USER: terminating instance due to error 607
Instance terminated by USER, pid = 844004
ORA-1092 signalled during: alter database open...

绕过该错误之后,数据库启动报ORA-600 2662错误

$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.6.0 - Production on Mon May 15 10:04:44 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1102023336 bytes
Fixed Size                   744104 bytes
Variable Size             922746880 bytes
Database Buffers          167772160 bytes
Redo Buffers               10760192 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Mon May 15 10:05:03 2023
SMON: enabling cache recovery
Mon May 15 10:05:03 2023
ARC0: Media recovery disabled
Mon May 15 10:05:03 2023
SMON: enabling tx recovery
Mon May 15 10:05:03 2023
Database Characterset is ZHS16GBK
Mon May 15 10:05:03 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409995], [3301], [2644132966], [4195678]
Mon May 15 10:05:04 2023
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon May 15 10:05:04 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409998], [3301], [2644132966], [4195678]
Mon May 15 10:05:04 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409998], [3301], [2644132966], [4195678]
SMON: terminating instance due to error 600
Instance terminated by SMON, pid = 413880

解决该错误之后,数据库open正常

$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.6.0 - Production on Mon May 15 10:10:30 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1102023336 bytes
Fixed Size                   744104 bytes
Variable Size             922746880 bytes
Database Buffers          167772160 bytes
Redo Buffers               10760192 bytes
Database mounted.
SQL> alter database open;

Database altered.

逻辑方式导出数据,本次恢复任务基本完成.
以前有过的类似恢复案例(类似较多选择典型几个):
ORA-600 2663
ORA-600 2663 故障恢复
ORA-600 2662
ora-600 2662和ora-600 kclchkblk_4恢复
redo异常 ORA-600 kclchkblk_4 故障恢复
ORA-600 4193 错误说明和解决
ORA-00600 [2662]和ORA-00600 [4194]恢复

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭

udev_start导致vip漂移(常见情况:rac在线加盘操作引起)

客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
20230513203654


优先恢复业务,把所有vip漂移回来

[grid@rac3 ~]$  srvctl relocate vip -i rac1 -n rac1 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac2 -n rac2 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac3 -n rac3 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac4 -n rac4 -f -v
VIP was relocated successfully.

vip恢复正常,业务也恢复正常
20230513203712


出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移
20230513212316

查看ifcfg配置文件
20230513212440

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)
20230513213713

发表在 Linux, Oracle RAC | 标签为 , , | 评论关闭