标签云
asm恢复 bbed bootstrap$ dul kcbzib_kcrsds_1 kccpb_sanity_check_2 kcratr_nab_less_than_odr MySQL恢复 ORA-00312 ORA-00704 ORA-00742 ORA-01110 ORA-01200 ORA-01555 ORA-01578 ORA-01595 ORA-600 2662 ORA-600 2663 ORA-600 3020 ORA-600 4000 ORA-600 4137 ORA-600 4193 ORA-600 4194 ORA-600 16703 ORA-600 kcbzib_kcrsds_1 ORA-600 KCLCHKBLK_4 ORA-600 kcratr_nab_less_than_odr ORA-600 kdsgrp1 ORA-15042 ORA-15196 ORACLE 12C oracle dul ORACLE PATCH Oracle Recovery Tools oracle加密恢复 oracle勒索 oracle勒索恢复 oracle异常恢复 ORACLE恢复 Oracle 恢复 ORACLE数据库恢复 oracle 比特币 OSD-04016 YOUR FILES ARE ENCRYPTED 比特币加密文章分类
- Others (2)
- 中间件 (2)
- WebLogic (2)
- 操作系统 (112)
- 数据库 (1,841)
- DB2 (22)
- MySQL (81)
- Oracle (1,669)
- Data Guard (53)
- EXADATA (8)
- GoldenGate (24)
- ORA-xxxxx (168)
- ORACLE 12C (72)
- ORACLE 18C (6)
- ORACLE 19C (15)
- ORACLE 21C (3)
- Oracle 23ai (8)
- Oracle ASM (69)
- Oracle Bug (8)
- Oracle RAC (55)
- Oracle 安全 (6)
- Oracle 开发 (28)
- Oracle 监听 (29)
- Oracle备份恢复 (632)
- Oracle安装升级 (103)
- Oracle性能优化 (62)
- 专题索引 (5)
- 勒索恢复 (89)
- PostgreSQL (37)
- pdu工具 (7)
- PostgreSQL恢复 (13)
- SQL Server (34)
- SQL Server恢复 (14)
- TimesTen (7)
- 达梦数据库 (4)
- 达梦恢复 (2)
- 生活娱乐 (2)
- 至理名言 (11)
- 虚拟化 (2)
- VMware (2)
- 软件开发 (47)
- Asp.Net (9)
- JavaScript (12)
- PHP (2)
- 小工具 (30)
-
最近发表
- aix环境rac 私网直连导致haip启动异常
- 又一例TRIM导致asm磁盘数据丢失的故障
- 一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理
- OraFHR快速open被勒索加密破坏的Oracle数据库
- obet一键恢复offline数据文件
- 记录一次win删除数据文件完美恢复案例
- Oracle典型故障:The controlfile header block returned by the OS has a sequence number that is too old
- 国产信创库fio破坏主备库以及备份故障处理
- .wman扩展名勒索mysql数据库恢复
- Oracle数据库被勒索加密一键open工具–OraFHR
- 通过alert日志回顾其他dba oracle异常恢复故障处理以及后续open数据库操作
- 年前几例Oracle数据库被加密为.wman的数据库故障恢复
- 文件系统损坏导致数据库异常故障处理
- expdp导出xml列报ORA-22924故障处理
- obet处理ORA-704 ORA-604 ORA-1578故障
- obet修复csc higher than block scn类型坏块
- ORA-600 kcratr_nab_less_than_odr和ORA-600 4193故障处理
- aix环境10g由于控制器异常导致ORA-600 4000故障处理
- ORA-600 3716故障处理
- 不当恢复truncate数据导致数据库不能open处理
分类目录归档:Oracle
aix环境rac 私网直连导致haip启动异常
以前写过一篇在linux平台rac环境,心跳网络通过网线直连,当其中一台机器关机之后,另外一个节点无法检测到心跳网络是active,导致无法启动的情况:私网直连后遗症:一节点无法启动导致另外节点haip无法启动
昨天晚上在aix环境中遇到类似情况,由于某种原因,需要关闭rac的一个节点,另外一个节点启动crs的过程中,haip始终无法启动,虽然haip起不来,但是过了一会儿,asm服务启动成功,磁盘组mount,数据库正常open(这个和linux环境有一定的区别,linux 下面11.2.0.4的rac,如果haip无法启动,默认情况启动asm服务),业务临时恢复
bash-4.2$ crsctl status res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE db2 Started
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE db2
ora.crsd
1 ONLINE ONLINE db2
ora.cssd
1 ONLINE ONLINE db2
ora.cssdmonitor
1 ONLINE ONLINE db2
ora.ctssd
1 ONLINE ONLINE db2 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE db2
ora.evmd
1 ONLINE ONLINE db2
ora.gipcd
1 ONLINE ONLINE db2
ora.gpnpd
1 ONLINE ONLINE db2
ora.mdnsd
1 ONLINE ONLINE db2
分析haip对应的日志如下
[ USRTHRD][7257]{0:0:221} Starting Probe for ip 169.254.57.103
[ USRTHRD][7257]{0:0:221} Transitioning to Probe State
[ USRTHRD][7257]{0:0:221} Arp::sProbe {
[ USRTHRD][7257]{0:0:221} Arp::sSend: sending type 1
[ USRTHRD][7257]{0:0:221} [NetHAWork] thread hit OSD exception failed to send arp
[ USRTHRD][7257]{0:0:221} (null) category: -2, operation: write, loc: arpsend:1,os, OS error: 69, other:
[ USRTHRD][7257]{0:0:221} [NetHAWork] thread stopping
[ USRTHRD][7257]{0:0:221} Thread:[NetHAWork]isRunning is reset to false here
[ USRTHRD][5201]{0:0:221} use all detected INF
[ USRTHRD][5201]{0:0:221} Thread:[NetHAWork]thread constructor
[ USRTHRD][5201]{0:0:221} HAIP: Moving ip '' from inf 'en6' to inf 'en6'
[ USRTHRD][5201]{0:0:221} pausing thread
[ USRTHRD][5201]{0:0:221} posting thread
[ USRTHRD][5201]{0:0:221} Waiting for HAIP work thread to cleanup ARP
[ USRTHRD][5201]{0:0:221} timeout to wait thread to cleanup ARP
[ USRTHRD][5201]{0:0:221} Thread:[NetHAWork]start {
[ USRTHRD][5201]{0:0:221} Thread:[NetHAWork]start }
[ USRTHRD][7514]{0:0:221} [NetHAWork] thread started
[ USRTHRD][7514]{0:0:221} Arp::sCreateSocket {
[ USRTHRD][7514]{0:0:221} Arp::sCreateSocket }
[ USRTHRD][5201]{0:0:221} use all detected INF
[ USRTHRD][7514]{0:0:221} Failed to check 169.254.57.103 on en6
[ USRTHRD][7514]{0:0:221} (null) category: 0, operation: , loc: , OS error: 0, other:
这里初步看是把169.254.57.103这个ip增加到en6的网卡上,但是由于OS error: 69失败了.通过aix工程师分析,这个错误可能是物理网络不通导致,对网卡状态进行分析
bash-4.2# entstat -d ent6
-------------------------------------------------------------
ETHERNET STATISTICS (ent6) :
Device Type: 2-Port Gigabit Ethernet-SX PCI-Express Adapter (14103f03)
Hardware Address: 40:f2:e9:91:eb:7a
Elapsed Time: 0 days 1 hours 38 minutes 14 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 4128 Packets: 5077
Bytes: 35215659 Bytes: 370511
Interrupts: 0 Interrupts: 4815
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 1
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 12 Broadcast Packets: 0
Multicast Packets: 62 Multicast Packets: 66
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0
General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Simplex
Limbo 64BitSupport ChecksumOffload
LargeSend DataRateSet
2-Port Gigabit Ethernet-SX PCI-Express Adapter (14103f03) Specific Statistics:
------------------------------------------------------------------------------
Link Status : Down <======表示网络链路状态异常(一般就是直连导致,如果通过交换机不会这样)
Media Speed Selected: Auto negotiation
Media Speed Running: Unknown
PCI Mode: PCI-Express X4
Relaxed Ordering: Enabled
TLP Size: 256
MRR Size: 4096
Jumbo Frames: Disabled
TCP Segmentation Offload: Enabled
TCP Segmentation Offload Packets Transmitted: 3625
TCP Segmentation Offload Packet Errors: 0
Transmit and Receive Flow Control Status: Enabled
XON Flow Control Packets Transmitted: 0
XON Flow Control Packets Received: 0
XOFF Flow Control Packets Transmitted: 0
XOFF Flow Control Packets Received: 0
Transmit and Receive Flow Control Threshold (High): 40960
Transmit and Receive Flow Control Threshold (Low): 20480
Transmit and Receive Storage Allocation (TX/RX): 4/44
通过解决掉异常问题,把故障主机启动之后,启动该机器之后,网络链路状态恢复正常,启动haip成功,但是由于该集群在haip异常的时候启动成功,心跳网络使用是直接的私网ip(没有使用haip),因此还是要对集群进行一次重启恢复到正常状态.
发表在 AIX, Oracle RAC
留下评论
又一例TRIM导致asm磁盘数据丢失的故障
以前遇到过一个case,存储直连虚拟机,对磁盘误操作之后触发trim,导致数据被清空:ssd trim导致fdisk格式化磁盘之后无法恢复,最近再次遇到类似案例:客户错误对一块asm disk磁盘进行了格式化

该磁盘是由6块磁盘组成了磁盘组

被格式化之后data磁盘组直接dismount
Tue Apr 07 18:22:31 2026 WARNING: cache read a corrupt block: group=2(DATA) fn=261 indblk=0 disk=0 (DATA_0000) incarn=3958745085 au=605 blk=0 count=1 Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc: ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] NOTE: a corrupted block from group DATA was dumped to /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc WARNING: cache read (retry) a corrupt block: group=2(DATA) fn=261 indblk=0 disk=0 (DATA_0000) incarn=3958745085 au=605 blk=0 count=1 Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc: ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] ERROR: cache failed to read group=2(DATA) fn=261 indblk=0 from disk(s): 0(DATA_0000) ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] NOTE: cache initiating offline of disk 0 group DATA NOTE: process _user639087_+asm1 (639087) initiating offline of disk 0.3958745085 (DATA_0000) with mask 0x7e in group 2 NOTE: initiating PST update: grp = 2, dsk = 0/0xebf5a7fd, mask = 0x6a, op = clear Tue Apr 07 18:22:31 2026 GMON updating disk modes for group 2 at 10 for pid 28, osid 639087 ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 2) Tue Apr 07 18:22:31 2026 NOTE: cache dismounting (not clean) group 2/0xE9E5571F (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 115720, image: oracle@ajjorcl1 (B000) Tue Apr 07 18:22:31 2026 NOTE: halting all I/Os to diskgroup 2 (DATA) WARNING: Offline for disk DATA_0000 in mode 0x7f failed. Tue Apr 07 18:22:31 2026 NOTE: LGWR doing non-clean dismount of group 2 (DATA) NOTE: LGWR sync ABA=15.1625 last written ABA 15.1625 Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc (incident=309345): ORA-15335: ASM metadata corruption detected in disk group 'DATA' ORA-15130: diskgroup "DATA" is being dismounted ORA-15066: offlining disk "DATA_0000" in group "DATA" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1] Incident details in: /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345/+ASM1_ora_639087_i309345.trc Tue Apr 07 18:22:31 2026 List of instances: 1 Dirty detach reconfiguration started (new ddet inc 1, cluster inc 30) Global Resource Directory partially frozen for dirty detach * dirty detach - domain 2 invalid = TRUE 26 GCS resources traversed, 0 cancelled Dirty Detach Reconfiguration complete Tue Apr 07 18:22:31 2026 freeing rdom 2 Tue Apr 07 18:22:31 2026 WARNING: dirty detached from domain 2 NOTE: cache dismounted group 2/0xE9E5571F (DATA) SQL> alter diskgroup DATA dismount force /* ASM SERVER:3924121375 */ Tue Apr 07 18:22:32 2026 Sweep [inc][309345]: completed System State dumped to trace file /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345/+ASM1_ora_639087_i309345.trc Tue Apr 07 18:22:32 2026 Dumping diagnostic data in directory=[cdmp_20260407182232], requested by (instance=1, osid=639087), summary=[incident=309345]. Tue Apr 07 18:22:32 2026 NOTE: cache deleting context for group DATA 2/0xe9e5571f GMON dismounting group 2 at 11 for pid 32, osid 115720 NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment NOTE:Waiting for all pending writes to complete before de-registering: grpnum 2 Tue Apr 07 18:22:34 2026 Sweep [inc2][309345]: completed NOTE: AMDU dump of disk group DATA created at /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345 Tue Apr 07 18:22:37 2026 NOTE: ASM client orcl1:orcl disconnected unexpectedly. NOTE: check client alert log. NOTE: Trace records dumped in trace file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_504268.trc Tue Apr 07 18:23:02 2026 SUCCESS: diskgroup DATA was dismounted SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:3924121375 */ SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA
通过kfed分析被格式化的磁盘,随机找了一些au发现都被置空

使用lsblk查看对应磁盘是否启用了TRIM 特性

基于这样的情况,基本上可以判断,该磁盘大概率已经触发了trim,数据被置空的概率非常大,最后对于镜像磁盘通过winhex查看,确认磁盘中除了基本的分区和文件系统信息之外其他都为空

基于此种情况,最好的结果就是恢复该6个磁盘组磁盘中5个磁盘的数据,这样丢失数据最少1/6以上,但是也是没有办法中的办法,尽可能减少损失了.
一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理
客户由于虚拟化环境空间不足,导致数据库异常,启动报ORA-600 kcratr_nab_less_than_odr错误
Mon Apr 06 00:13:16 2026 Completed: alter database mount exclusive alter database open Beginning crash recovery of 1 threads parallel recovery started with 3 processes Started redo scan Mon Apr 06 00:13:26 2026 Completed redo scan read 5480 KB redo, 459 data blocks need recovery Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc (incident=418959): ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_418959\orcl_ora_2324_i418959.trc Aborting crash recovery due to error 600 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc: ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], [] Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc: ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], [] ORA-600 signalled during: alter database open... Mon Apr 06 00:13:33 2026 Trace dumping is performing id=[cdmp_20260406001333]
由于客户自己不熟悉,故障之后,没有再次继续操作,一直保留着现场。这个故障一般是由于ctl写丢失导致,一般首先选择rectl

然后尝试open库,运气不错,直接打开成功

这样就完成了本次恢复工作,数据库一切正常,运气不错。对于ORA-600 kcratr_nab_less_than_odr错误大部分时候,可以这样简单的恢复,但是也遇到过rectl之后,继续报ORA-600等错误的情况:
ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 4193故障处理

加我微信(17813235971)
加我QQ(107644445)

