分类目录归档:Oracle

存储双活同步导致数据库异常恢复

客户双活存储异常之后,单个存储运行,故障存储修复之后,双活同步,出现多套系统异常,上一篇:Control file mount id mismatch!故障处理,这套是win的rac无法正常启动,ocr磁盘组异常(报ORA-600 kfrValAcd30无法正常mount)

C:\Users\Administrator>crsctl start cluster -all
CRS-2672: 尝试启动 'ora.crf' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.crf' (在 'xff1' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff1' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff1' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff1\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff1' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff1' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff1' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff1' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff1' 上)
CRS-4705: 无法在节点 xff1 上启动集群件。
CRS-4705: 无法在节点 xff2 上启动集群件。
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

因为是ocr磁盘组操作比较简单,直接重建该磁盘组,还原ocr等即可

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              300M
NTFS                             \Device\Harddisk0\Partition4           599472M
NTFS                             \Device\Harddisk0\Partition5          1000000M
ORCLDISKDATA0                    \Device\Harddisk1\Partition1          1048587M
ORCLDISKDATA1                    \Device\Harddisk2\Partition1          1048587M
ORCLDISKDATA2                    \Device\Harddisk3\Partition1          1048587M
ORCLDISKDATA3                    \Device\Harddisk4\Partition1          1048587M
ORCLDISKDATA4                    \Device\Harddisk6\Partition1           460797M

C:\Users\Administrator>crsctl start crs -excl -nocrs
CRS-4123: Oracle 高可用性服务已启动。
CRS-2672: 尝试启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gipcd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gipcd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.ctssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.ctssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.ctssd' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.ctssd' (在 'xff2' 上)
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 13:52:07 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup nomount pfile='f:/pfile_asm.txt';
ASM 实例已启动

Total System Global Area 1140850688 bytes
Fixed Size                  3054680 bytes
Variable Size            1112630184 bytes
ASM Cache                  25165824 bytes

SQL>  create diskgroup OCR_VOTE  external redundancy disk '\\.\ORCLDISKDATA4' force  attribute 'COMPATIBLE.ASM' = '12.1.0';

Diskgroup created.

F:\>ocrconfig -restore backup00.ocr

F:\>crsctl replace votedisk +OCR_VOTE
已成功添加表决磁盘 e2b8fdbd05ae4f9fbf3531630853dbbc。
已成功将表决磁盘组替换为 +OCR_VOTE。
CRS-4266: 已成功替换表决文件

F:\>crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   e2b8fdbd05ae4f9fbf3531630853dbbc (\\.\ORCLDISKDATA4) [OCR_VOTE]
找到了 1 个表决磁盘。

F:\>ocrcheck
Oracle 集群注册表的状态如下:
         版本                  :          4
         总空间 (KB)     :     409568
         已用空间 (KB)      :       1348
         可用空间 (KB):     408220
         ID                       :  820087446
         设备/文件名         :  +OCR_VOTE
                                    设备/文件完整性检查成功

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

         集群注册表完整性检查成功

         逻辑损坏检查成功

mount其他磁盘组成功

SQL> alter diskgroup arch mount;

Diskgroup altered.

SQL>


SQL> alter diskgroup data mount;

Diskgroup altered.

尝试恢复数据库失败

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 14:09:39 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area 2.0992E+11 bytes
Fixed Size                  7797816 bytes
Variable Size            1.3798E+11 bytes
Database Buffers         7.1672E+10 bytes
Redo Buffers              260636672 bytes
数据库装载完毕。

SQL> recover database;
ORA-10562: Error occurred while applying redo to data block (file# 13, block#1033775)
ORA-10564: tablespace USERS
ORA-01110: 数据文件 13: '+DATA/XFF/users07.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: 内部错误代码, 参数: [kdolkr-2], [2], [1], [44], [], [], [], [], [],[], [], []


SQL> recover datafile 2;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'


SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'

SQL> recover datafile 10;
ORA-00283: ??????????
ORA-10562: Error occurred while applying redo to data block (file# 10, block#
2899468)
ORA-10564: tablespace USERS
ORA-01110: ???? 10: '+DATA/XFF/users04.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: ??????, ??: [ktbair2: illegal  inheritance], [], [], [], [], [], [],[], [], [], [], []

除了ORA-00742,还有其他一些日志应用错误,比如:ORA-600 ktbair2: illegal inheritance,ORA-600 kdolkr-2等,无法正常应用日志,尝试强制打开库,报ORA-600 2662错误.

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [8], [678024613], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [8], [678024612], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [8], [678024610], [8],
[678508930], [12583040], [], [], [], [], [], []
进程 ID: 4628
会话 ID: 996 序列号: 48547

通过自研的Patch_SCN工具快速解决该问题
20230507195805


open数据库成功,实现最大限度抢救客户数据.

发表在 Oracle备份恢复 | 标签为 , , , , | 评论关闭

Control file mount id mismatch!故障处理

通过沟通确认客户由于存储双活异常,业务运行在主存储上,另外一套存储修复之后,进行存储双活同步,结果在这个过程中由于遭遇Control file mount id mismatch! 导致数据库crash了

2023-05-03T20:21:07.446873+08:00
Archived Log entry 491897 added for T-1.S-246903 ID 0x97d92f0b LAD:1
2023-05-03T20:47:53.902701+08:00
Error: 2141
Control file mount id mismatch!
fhmid: 2592441863, SGA mid: 2624617448
Requesting DIAG on each RAC instance to dump the control file header block
2023-05-03T20:47:55.906490+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_rms0_20989.trc:
2023-05-03T20:47:56.521500+08:00
RMS0 (ospid: 20989): terminating the instance
2023-05-03T20:47:56.610656+08:00
System state dump requested by (instance=1, osid=20989 (RMS0)), summary=[abnormal instance termination].
System State dumped to trace file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_diag_20912_20230503204756.trc
2023-05-03T20:47:58.480397+08:00
License high water mark = 395
2023-05-03T20:48:02.600203+08:00
Instance terminated by RMS0, pid = 20989
2023-05-03T20:48:02.601563+08:00
Warning: 2 processes are still attach to shmid 393226:
 (size: 28672 bytes, creator pid: 19941, last attach/detach pid: 20912)
2023-05-03T20:48:03.481726+08:00
USER (ospid: 967): terminating the instance
2023-05-03T20:48:03.483351+08:00
Instance terminated by USER, pid = 967

节点自动重启报错ORA-600 kccsbck_first

2023-05-03T20:48:34.870435+08:00
NOTE: ASMB mounting group 2 (FRA)
NOTE: ASM background process initiating disk discovery for grp 2 (reqid:0)
NOTE: Assigning number (2,1) to disk (/dev/asm_data0g)
NOTE: Assigning number (2,0) to disk (/dev/asm_data0f)
SUCCESS: mounted group 2 (FRA)
NOTE: grp 2 disk 1: FRA_0001 path:/dev/asm_data0g
NOTE: grp 2 disk 0: FRA_0000 path:/dev/asm_data0f
2023-05-03T20:48:34.919965+08:00
NOTE: dependency between database xff and diskgroup resource ora.FRA.dg is established
2023-05-03T20:48:38.983416+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_2436.trc  (incident=1333249):
ORA-00600: ??????, ??: [kccsbck_first], [1], [2624617448], [], [], [], [], [], [], [], [], []
Incident details in: /opt/rac/oracle/diag/rdbms/xff/xff1/incident/incdir_1333249/xff1_ora_2436_i1333249.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:8:116} */...

再次重启数据库报错ORA-00742 ORA-00312

2023-05-04T08:18:59.635790+08:00
Aborting crash recovery due to error 742
2023-05-04T08:18:59.635897+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
Abort recovery for domain 0, flags 4
2023-05-04T08:18:59.647994+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
ORA-742 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:37368:2} */...
2023-05-04T08:19:00.820708+08:00
License high water mark = 33
2023-05-04T08:19:00.820936+08:00
USER (ospid: 82788): terminating the instance
2023-05-04T08:19:01.827132+08:00
Instance terminated by USER, pid = 82788

明显数据库在启动的时候做实例恢复,发现redo写丢失,从而引起数据库无法正常open,对于此类故障,处理比较多
ORA-00742 ORA-00312 故障恢复-1
ORA-00742 ORA-00312故障恢复-2
ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况

发表在 Oracle备份恢复 | 标签为 , , , | 评论关闭

Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.

中联的his系统在alert日志中经常会看到如下的日志告警

[oracle@oracle1 trace]$ tail -f alert_orcl.log 
Tue May 02 22:06:46 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.

查询ZLHIS用户当前的role情况

SQL>  select Grantee, count(*) "Role Number" from
  2   (
  3   select distinct connect_by_root grantee Grantee, granted_role
  4   from dba_role_privs 
 connect by prior granted_role=grantee
  5    6   ) where GRANTEE='ZLHIS'
 group by Grantee  7  
  8  /

GRANTEE                        Role Number
------------------------------ -----------
ZLHIS                                  149

虽然max_enabled_roles参数为150

SQL> show parameter MAX_ENABLED_ROLES;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
max_enabled_roles                    integer     150

但是用户支持的默认最大enable role为148个,对于该问题可以把一些角色的权限进行合并,然后再授权给ZLHIS,或者删除掉一些不需要的角色授权.如果一定需要这些角色,而且使其在用户登录的时候enable,可以以下两种常见方法解决

alter user <username> default roles <list of roles>;   
--可以是all或者角色列表

或者在会话中启用role

set roles all;
or
execute dbms_session.set_role('ALL');

参考:What to Check When Dealing With Ora-28031: Maximum Of 148 Enabled Roles Exceeded? (Doc ID 778785.1)

发表在 Oracle | 评论关闭