作者归档:惜分飞

ORA-600 2662快速恢复之Patch scn工具

有客户数据库启动报ORA-600 2662错误

SQL> recover database;
完成介质恢复。
SQL> alter database open ;
alter database open 
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [272845978], [0], [272853062], [12583040]
ORA-00600: internal error code, arguments: [2662], [0], [272845977], [0], [272853062], [12583040]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [272845973], [0], [272853062], [12583040]
进程 ID: 3528
会话 ID: 33 序列号: 1

通过自研开发的patch scn工具,修改数据库scn值
20240313180657


然后open数据库成功
20240313181402

对于这类故障,patch scn工具是最快速的解决方案

发表在 Oracle备份恢复 | 标签为 , | 留下评论

TNS-12518: TNS:listener could not hand off client connection

客户反馈业务经常性出现数据库连接异常,通过工具访问ORACLE进程报ORA-12170
ORA-12170


通过分析发现lsnrctl status几乎hang住,tnsping延迟特别大
tnsping-slow

进一步分析监听日志发现TNS-12518: TNS:listener could not hand off client connection错误

12-MAR-2024 15:34:50 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)
(HOST=__jdbc__)(USER=Administrator))(SERVICE_NAME=ilas)) * (ADDRESS=(PROTOCOL=tcp)
(HOST=ip)(PORT=52854)) * establish * ilas * 12518
TNS-12518: TNS:listener could not hand off client connection
 TNS-12547: TNS:lost contact
  TNS-12560: TNS:protocol adapter error
   TNS-00517: Lost contact
    Linux Error: 32: Broken pipe

根据经验和Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection描述,检查监听文件配置

[oracle@xff admin]$ cat listener.ora
# listener.ora Network Configuration File:/home/u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = CLRExtProc)
      (ORACLE_HOME = /home/u01/app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = extproc)
      (ENVS = "EXTPROC_DLLS=ONLY:/home/u01/app/oracle/product/11.2.0/dbhome_1/bin/oraclr11.dll")
    )
  )

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = IP)(PORT = 1521))
    )
  )

ADR_BASE_LISTENER = /home/u01/app/oracle

根据经验和客户的业务进行分析,确认他们不会使用 external procedures方式访问数据库,直接修改监听配置

[oracle@xff admin]$ cat listener.ora
# listener.ora Network Configuration File:/home/u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = xff)
      (ORACLE_HOME = /home/u01/app/oracle/product/11.2.0/dbhome_1)
      (GLOBAL_DBNAME = xff)
    )
  )

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = IP)(PORT = 1521))
    )
  )

ADR_BASE_LISTENER = /home/u01/app/oracle

然后reload配置,再使用lsnrctl status查看结果秒出,tnsping也非常快
20240312205940


让客户测试应用也恢复正常,一切ok,问题在最小修改的情况下解决,和最初供应商建议的重装系统,双机,数据库等解决方案大大简化

发表在 Oracle 监听 | 标签为 , | 留下评论

ora.storage无法启动报ORA-12514故障处理

19.11集群,节点2人工重启之后,crs启动异常

[grid@xff2 ~]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crf
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xff2                    STABLE
ora.ctssd
      1        ONLINE  ONLINE       xff2                    OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xff2                    STABLE
ora.evmd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gipcd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.storage
      1        ONLINE OFFLINE                                STABLE
--------------------------------------------------------------------------------

crs的alert日志显示

2024-03-05 12:46:26.021 [CLSECHO(3653)]ACFS-9327: Verifying ADVM/ACFS devices.
2024-03-05 12:46:26.040 [CLSECHO(3661)]ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
2024-03-05 12:46:26.065 [CLSECHO(3673)]ACFS-9156: Detecting control device '/dev/ofsctl'.
2024-03-05 12:46:26.357 [CLSECHO(3703)]ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
2024-03-05 12:46:26.376 [CLSECHO(3711)]ACFS-9322: completed
2024-03-05 12:46:27.764 [CSSDMONITOR(3855)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 3855
2024-03-05 12:46:27.839 [OSYSMOND(3857)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3857
2024-03-05 12:46:28.129 [CSSDAGENT(3890)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 3890
2024-03-05 12:46:29.125 [OCSSD(3910)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 3910
2024-03-05 12:46:30.187 [OCSSD(3910)]CRS-1713: CSSD daemon is started in hub mode
2024-03-05 12:46:31.428 [OCSSD(3910)]CRS-1707: Lease acquisition for node xff2 number 2 completed
2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1621: The IPMI configuration data for this node stored in the Oracle registry is incomplete; details at (:CSSNK00002:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc
2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1617: The information required to do node kill for node xff2 is incomplete; details at (:CSSNM00004:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc
2024-03-05 12:46:32.638 [OCSSD(3910)]CRS-1605: CSSD voting file is online: /dev/sda1; details in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc.
2024-03-05 12:46:33.546 [OCSSD(3910)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xff1 xff2 .
2024-03-05 12:46:35.405 [OCSSD(3910)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-03-05 12:46:35.533 [OCTSSD(4138)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 4138
2024-03-05 12:46:36.339 [OCTSSD(4138)]CRS-2403: The Cluster Time Synchronization Service on host xff2 is in observer mode.
2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xff1.
2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2401: The Cluster Time Synchronization Service started on host xff2.
2024-03-05 12:46:54.181 [ORAROOTAGENT(2427)]CRS-5019: All OCR locations are on ASM disk groups [SYSTEMDG], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc".
2024-03-05 12:47:15.209 [OLOGGERD(4553)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 4553
2024-03-05 12:52:04.581 [CRSCTL(8313)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc.
2024-03-05 12:56:44.519 [ORAROOTAGENT(2427)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc.
2024-03-05 12:56:44.608 [OHASD(2217)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd.trc.
2024-03-05 12:56:44.606 [ORAROOTAGENT(2427)]CRS-5017: The resource action "ora.storage start" encountered the following error:
2024-03-05 12:56:44.606+agent's abort action pending. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc".
2024-03-05 12:57:58.464 [CRSD(11801)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 11801
2024-03-05 12:58:12.059 [CRSD(11801)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsd.trc.

ohasd_orarootagent_root 日志

2024-03-05 12:52:00.769 :  OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM.

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfnConnect2: failed to connect

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 124s elapsed

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:00.771 :  OCRRAW:4255452928: -- trace dump on error exit --

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:2176

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: Category: 7

"/u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc" 208L, 11809C

2024-03-05 12:52:03.543 :  OCRRAW:4255452928: 9379 Error 4 opening dom root in 0xf9afdb79c0

2024-03-05 12:52:03.551 :  OCRRAW:4255452928: kgfnConnect2: kgfnGetBeqData failed

2024-03-05 12:52:03.577 :  OCRRAW:4255452928: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))

2024-03-05 12:52:03.578 :  OCRRAW:4255452928: kgfnConnect2Int: ServerAttach

2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnServerAttachConnErrors: Encountered service based error 12514

2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnRecordErr 12514 OCI error:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM.

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfnConnect2: failed to connect

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 122s elapsed

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:04.581 :  OCRRAW:4255452928: -- trace dump on error exit --

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: Category: 7

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: DepInfo: 12514

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: ADR is not properly configured

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: -- trace dump end --

  OCRASM:4255452928: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=12514, loc=kgfokge

2024-03-05 12:52:04.581 :  OCRASM:4255452928: ASM Error Stack : ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

2024-03-05 12:52:04.581 :  OCRASM:4255452928: proprasmo: kgfoCheckMount returned [7]
2024-03-05 12:52:04.581 :  OCRASM:4255452928: proprasmo: The ASM instance is down
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprioo: Failed to open [+SYSTEMDG/xff-cluster/OCRFILE/registry.255.1072903025]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprioo: No OCR/OLR devices are usable
  OCRUTL:4255452928: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices]
 default:4255452928: u_set_gbl_comp_error: comptype '107' : error '0'
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprinit: Could not open raw device
2024-03-05 12:52:04.635 : default:4255452928: a_init:7!: Backend init unsuccessful : [26]
2024-03-05 12:52:04.637 : default:4255452928: clsvactversion:4: Retrieving Active Version from local storage.

通过这里,初步判断是由于节点2访问(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))异常导致,查看节点1的该监听状态

[grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:04:51

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                20-MAY-2021 23:53:50
Uptime                    25 days 8 hr. 15 min. 15 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19c/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))
The listener supports no services
The command completed successfully

发现该监听没有注册服务进去,检查相关listener参数配置

[grid@xff1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:26:29 2024
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0

SQL> show parameter listener;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
forward_listener                     string
listener_networks                    string
local_listener                       string      
remote_listener                      string

初步判断是由于节点1的ASMNET1LSNR_ASM监听状态异常,很可能是由于asm实例的listener参数异常导致,比较稳妥的解决方案是重启节点1,让其重新生成listener相关参数,实现动态注册,临时解决方法,

[grid@xff1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:05:11 2024
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0

SQL> ALTER SYSTEM SET local_listener ='(ADDRESS=(PROTOCOL=TCP)(HOST=节点1私网IP)(PORT=1525))' sid='+ASM1' SCOPE=MEMORY;

System altered.



[grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:05:21

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                20-MAY-2021 23:53:50
Uptime                    25 days 8 hr. 15 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19c/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_FRA" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_SYSTEMDG" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
The command completed successfully
[grid@xff1 ~]$ 

设置节点1的asm实例的local_listener 参数之后,集群启动成功

[grid@xff2 ~]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crf
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xff2                    STABLE
ora.ctssd
      1        ONLINE  ONLINE       xff2                    OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xff2                    STABLE
ora.evmd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gipcd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.storage
      1        ONLINE  ONLINE       xff2                    STABLE
--------------------------------------------------------------------------------
发表在 Oracle RAC | 标签为 , | 一条评论