标签归档:私网直连问题

私网直连后遗症:一节点无法启动导致另外节点haip无法启动

该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动

# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
     1        ONLINE  ONLINE       xifenfei1                  Started                     
ora.cluster_interconnect.haip                                                      >>>>  OFFLINE
     1        ONLINE  OFFLINE
ora.crf
     1        ONLINE  ONLINE       xifenfei1
ora.crsd
     1        ONLINE  OFFLINE                                                      >>>>  OFFLINE
ora.cssd
     1        ONLINE  ONLINE       xifenfei1
ora.cssdmonitor
     1        ONLINE  ONLINE       xifenfei1
ora.ctssd
     1        ONLINE  ONLINE       xifenfei1                  OBSERVER
ora.diskmon
     1        OFFLINE OFFLINE
ora.drivers.acfs
     1        ONLINE  ONLINE       xifenfei1
ora.evmd
     1        ONLINE  INTERMEDIATE xifenfei1
ora.gipcd
     1        ONLINE  ONLINE       xifenfei1
ora.gpnpd
     1        ONLINE  ONLINE       xifenfei1
ora.mdnsd
     1        ONLINE  ONLINE       xifenfei1

alerthostname日志

2018-09-02 10:38:56.767: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:39:00.771: 
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
2018-09-02 10:40:00.802: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:40:04.806: 
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.

orarootagent_root日志

2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16
2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage.
2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0.
[   CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000]
2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces
2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces

gipcd.log日志

2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces
2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client
2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)]
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist
2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query
2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery
2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0

ohasd.log日志

2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500
2018-09-02 10:38:57.255: [   CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500
2018-09-02 10:38:57.256: [   CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: 
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log".
2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动

[root@xifenfei1 rules.d]# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no   ====>网卡链路状态异常


[root@xifenfei1 rules.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:36  
          inet addr:10.10.17.42  Bcast:172.17.17.255  Mask:255.255.255.0
          inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1     --------->注意
          RX packets:234424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16926236 (16.1 MiB)  TX bytes:24269882 (23.1 MiB)
          Memory:91160000-91180000 

eth1      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:37  
          inet addr:11.1.1.2  Bcast:11.1.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1      --------->注意少了RUNNING
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:91140000-91160000 

关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)

发表在 Oracle RAC | 标签为 | 评论关闭