标签云
asm mount asm恢复 asm 恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 kfed MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-01110 ORA-01555 ORA-01578 ORA-08103 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 ORACLE恢复 Oracle 恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (96)
- 数据库 (1,521)
- DB2 (22)
- MySQL (65)
- Oracle (1,396)
- Data Guard (43)
- EXADATA (7)
- GoldenGate (21)
- ORA-xxxxx (154)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (12)
- ORACLE 21C (3)
- Oracle ASM (63)
- Oracle Bug (7)
- Oracle RAC (46)
- Oracle 安全 (6)
- Oracle 开发 (26)
- Oracle 监听 (26)
- Oracle备份恢复 (488)
- Oracle安装升级 (79)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (71)
- PostgreSQL (13)
- PostgreSQL恢复 (3)
- SQL Server (27)
- SQL Server恢复 (8)
- TimesTen (7)
- 达梦数据库 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (34)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (17)
-
最近发表
- ORA-16038 ORA-00354故障处理
- unknown variable ‘default-character-set=utf8′ 处理
- ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922
- gpk-update-icon进程占用CPU资源100%
- 难见的oracle 9i恢复—2023年
- udev_start导致vip漂移(常见情况:rac在线加盘操作引起)
- 又一例ORA-600 kcbzpbuf_1恢复
- ORA-01172 ORA-01151 故障恢复
- 存储双活同步导致数据库异常恢复
- Control file mount id mismatch!故障处理
- Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
- echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message
- ORA-600 kzrini:!uprofile处理
- 数据库open报ORA-07445 kglsget错误处理
- 12.1.0.2最新patch信息—202304
- 11.2.0.4最新patch信息—202304
- 数据库启动报ORA-600 kcbgtcr_13处理
- win平台 UtilSession 失败: Prerequisite check “CheckActiveFilesAndExecutables” failed. 处理
- Oracle Recovery Tools快速恢复断电引起的无法正常启动数据库(ORA-01555,MISSING000等问题)
- login trigger导致ORA-16191问题
友情链接
分类目录归档:Oracle RAC
udev_start导致vip漂移(常见情况:rac在线加盘操作引起)
客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
优先恢复业务,把所有vip漂移回来
[grid@rac3 ~]$ srvctl relocate vip -i rac1 -n rac1 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac2 -n rac2 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac3 -n rac3 -f -v VIP was relocated successfully. [grid@rac3 ~]$ srvctl relocate vip -i rac4 -n rac4 -f -v VIP was relocated successfully.
出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移

查看ifcfg配置文件

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)

删除ora.asmgroup资源offline记录
采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在
[grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$ srvctl modify asm -count 2 [grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$
网卡异常导致数据库实例启动异常
一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志
Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Receiver ospid 6386 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc: IPC Send timeout detected. Receiver ospid 6402 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc: Tue Mar 07 19:07:29 2023 Received an instance abort message from instance 1 Please check instance 1 alert and LMON trace files for detail. System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc LMD0 (ospid: 6384): terminating the instance due to error 481 Dumping diagnostic data in directory=[cdmp_20230307190729], requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. Instance terminated by LMD0, pid = 6384
正常节点alert日志
Tue Mar 07 19:02:07 2023 Reconfiguration started (old inc 20, new inc 22) List of instances: 1 2 (myinst: 1) Global Resource Directory frozen Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Tue Mar 07 19:02:08 2023 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 Tue Mar 07 19:02:08 2023 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Tue Mar 07 19:02:27 2023 IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)] Receiver: inst 2 binc 441249706 ospid 59731 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)] Receiver: inst 2 binc 429479852 ospid 6386 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)] Receiver: inst 2 binc 429479854 ospid 6402 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]
通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置
net.ipv4.ipfrag_high_thresh = 16777216 net.ipv4.ipfrag_low_thresh = 15728640
packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常

后续等网络层面解决之后再启用该网卡
发表在 Oracle RAC
评论关闭