标签云
asm恢复 bbed bootstrap$ dul In Memory kcbzib_kcrsds_1 kccpb_sanity_check_2 MySQL恢复 ORA-00312 ORA-00607 ORA-00704 ORA-00742 ORA-01110 ORA-01555 ORA-01578 ORA-01595 ORA-08103 ORA-600 2131 ORA-600 2662 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 Oracle 恢复 ORACLE恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 勒索恢复 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (103)
- 数据库 (1,753)
- DB2 (22)
- MySQL (76)
- Oracle (1,596)
- Data Guard (52)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (162)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (68)
- Oracle Bug (8)
- Oracle RAC (54)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (28)
- Oracle备份恢复 (586)
- Oracle安装升级 (96)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (84)
- PostgreSQL (30)
- pdu工具 (6)
- PostgreSQL恢复 (9)
- SQL Server (31)
- SQL Server恢复 (12)
- TimesTen (7)
- 达梦数据库 (3)
- 达梦恢复 (1)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (38)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (21)
-
最近发表
- 达梦数据库dm.ctl文件异常恢复
- Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
- 可能是 tempdb 空间用尽或某个系统表不一致故障处理
- 11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
- [MY-013183] [InnoDB] Assertion failure故障处理
- Oracle 19c 202504补丁(RUs+OJVM)-19.27
- Oracle Recovery Tools修复ORA-600 6101/kdxlin:psno out of range故障
- pdu完美支持金仓数据库恢复(KingbaseES)
- 虚拟机故障引起ORA-00310 ORA-00334故障处理
- pg创建gbk字符集库
- PostgreSQL运行日志管理
- ora-600 kdsgrp1 错误描述
- GAM、SGAM 或 PFS 页上存在页错误处理
- ORA-600 krhpfh_03-1208
- VMware勒索加密恢复(vmdk勒索恢复)
- ORA-39773: parse of metadata stream failed故障处理
- sql数据库备份失败—失败: 23(数据错误(循环冗余检查)
- vmdk文件被加密恢复(虚拟机文件加密)
- 差点被误操作的ORA-600 kcratr_nab_less_than_odr故障
- win平台19c 打patch遭遇2个小问题汇总
分类目录归档:Oracle RAC
删除ora.asmgroup资源offline记录
采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在
[grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 ONLINE OFFLINE STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE 3 OFFLINE OFFLINE STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$ srvctl modify asm -count 2 [grid@dbserver1 ~]$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.chad ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.net1.network ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.ons ONLINE ONLINE dbserver1 STABLE ONLINE ONLINE dbserver2 STABLE ora.proxy_advm OFFLINE OFFLINE dbserver1 STABLE OFFLINE OFFLINE dbserver2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.FRA.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbserver1 STABLE ora.SYSDG.dg(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE dbserver1 Started,STABLE 2 ONLINE ONLINE dbserver2 Started,STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.asmnet2.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE dbserver1 STABLE 2 ONLINE ONLINE dbserver2 STABLE ora.cvu 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver1.vip 1 ONLINE ONLINE dbserver1 STABLE ora.dbserver2.vip 1 ONLINE ONLINE dbserver2 STABLE ora.xff.db 1 ONLINE ONLINE dbserver1 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE 2 ONLINE ONLINE dbserver2 Open,HOME=/u01/app/o racle/product/19c/db _1,STABLE ora.qosmserver 1 ONLINE ONLINE dbserver1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbserver1 STABLE -------------------------------------------------------------------------------- [grid@dbserver1 ~]$
网卡异常导致数据库实例启动异常
一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志
Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Receiver ospid 6386 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc: IPC Send timeout detected. Receiver ospid 6402 [ Tue Mar 07 19:07:29 2023 Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc: Tue Mar 07 19:07:29 2023 Received an instance abort message from instance 1 Please check instance 1 alert and LMON trace files for detail. System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc LMD0 (ospid: 6384): terminating the instance due to error 481 Dumping diagnostic data in directory=[cdmp_20230307190729], requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination]. Instance terminated by LMD0, pid = 6384
正常节点alert日志
Tue Mar 07 19:02:07 2023 Reconfiguration started (old inc 20, new inc 22) List of instances: 1 2 (myinst: 1) Global Resource Directory frozen Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Tue Mar 07 19:02:08 2023 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Tue Mar 07 19:02:08 2023 Tue Mar 07 19:02:08 2023 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Tue Mar 07 19:02:27 2023 IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)] Receiver: inst 2 binc 441249706 ospid 59731 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)] Receiver: inst 2 binc 429479852 ospid 6386 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)] Receiver: inst 2 binc 429479854 ospid 6402 Tue Mar 07 19:07:29 2023 IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]
通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置
net.ipv4.ipfrag_high_thresh = 16777216 net.ipv4.ipfrag_low_thresh = 15728640
packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常

后续等网络层面解决之后再启用该网卡
发表在 Oracle RAC
评论关闭
11.2 crs启动超时dd npohasd 处理
客户由于光纤链路故障导致表决盘异常从而使得主机重启,主机重启之后,集群没有正常启动
操作系统和crs版本
[root@rac1 ~]# cat /etc/redhat-release CentOS release 6.9 (Final) [root@rac1 ~]# sqlplus -v SQL*Plus: Release 11.2.0.4.0 Production
人工启动crs hang住一段时间然后报错
[root@rac1 ~]# crsctl start crs CRS-4640: Oracle High Availability Services is already active CRS-4000: Command Start failed, or completed with errors.
查看启动进程
[grid@rac1 ~]$ ps -ef|grep d.bin root 7043 1 0 11:48 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot root 8311 1 0 11:53 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot grid 10984 10954 0 12:10 pts/2 00:00:00 grep d.bin
根据经验这个故障很可能就是BUG:17229230 – DURING REBOOT, “OHASD.BIN REBOOT” REMAINS SLEEPING,临时解决方案,一个会话启动crs,然后在另外一个会话发起
/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
后续crs启动正常
[root@rac1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rac1 ~]# crsctl status res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE rac1 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE STARTING ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1
终止dd命令,集群启动正常