分类目录归档:Oracle 监听

windows平台listener.log超过4G导致监听异常

今天有个朋友咨询生产库非常慢,应用无法连接数据库,通过分析是由于win 32位 数据库11.2.0.1版本 文件系统为ntfs,但是listener.log为4G,导致监听工作不正常.
tnsping几乎hang住
tnsping


lsnrctl status 也几乎hang住
lsnrctl-status

直接ping ip正常
证明不是网络问题,导致监听异常,现在判断是监听问题
ping-ip


监听日志超过4G
listener_file

关闭监听日志
log_status_off

监听正常
关闭监听日志之后,监听恢复正常
2

文件系统为ntfs格式
这里证明,不是由于文件系统格式的限制ntfs最大文件允许64T,这里4G肯定不是系统的极限
ntfs


相关文章
Bug 9879101 : THE CONNECT THROUGH LISTENER WAS SLOW WHEN LISTNER LOG GROWED 4GB
WINDOWS: Listener Hangs & Lsnrctl Commands Are Slow or Hang

发表在 Oracle 监听 | 标签为 , , , | 评论关闭

dns解析导致opiodr aborting process unknown ospid (7266) as a result of ORA-609类似错误

数据库alert日志中出现大量类似opiodr aborting process unknown ospid (7266) as a result of ORA-609错误

Tue Oct 27 07:51:10 2015
opiodr aborting process unknown ospid (7266) as a result of ORA-609
Tue Oct 27 08:16:56 2015
opiodr aborting process unknown ospid (7660) as a result of ORA-609
Tue Oct 27 08:17:16 2015
opiodr aborting process unknown ospid (7666) as a result of ORA-609
Tue Oct 27 08:21:50 2015
opiodr aborting process unknown ospid (7725) as a result of ORA-609
Tue Oct 27 08:37:43 2015
opiodr aborting process unknown ospid (7940) as a result of ORA-609
Tue Oct 27 08:51:25 2015
opiodr aborting process unknown ospid (8126) as a result of ORA-609
Tue Oct 27 09:07:16 2015
opiodr aborting process unknown ospid (8359) as a result of ORA-609
Tue Oct 27 09:26:09 2015
opiodr aborting process unknown ospid (8600) as a result of ORA-609

测试tnsping

[oracle@Sql admin]$ tnsping orcl

TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:40:55

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (20120 msec)
[oracle@Sql admin]$ tnsping orcl

TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:42:35

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (41600 msec)

hosts文件配置

[oracle@Sql admin]$ vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 
192.168.2.162 Sql.xifenfei.com
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

同时测试lsnrctl start/stop/status都很慢,但是ping localhost非常快,证明问题在监听之上.search 关键字lsnrctl slow/tnsping slow关键字发现有相关文档:
Bug 8307164 : TNSPING 11G USING DNS AND NOT HOSTS FILE
Listener Startup or Connections Hang in 11g (Doc ID 803838.1)
11g: Remote Connections Take Very Long to Establish (Doc ID 856820.1)
通过这些文档,可以知道在oracle 10g中解析主机名使用gethostbyname()函数,11g中使用了使用getaddrinfo()函数.这个函数对于/etc/nsswitch.conf的调用和ethostbyname()不一样,从而出现此类问题.根据官方建议:

Linux
hosts: files [NOTFOUND=continue] dns OR  hosts:files/dns

HPUX and Solaris
ipnodes: files [NOTFOUND=continue] dns

本次解决方法
因为对于没有使用到dns服务器的数据库环境而言,没有必要配置这玩意,给自己埋雷,估计是当时安装yum之时没有清理掉

vi /etc/nsswitch.conf
hosts:      files nds  -->hosts:      files

再次测试tnsping,lsnrctl等命令,速度ok

[oracle@Sql"]$ tnsping orcl

TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:58:30

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (10 msec)

对于此类监听慢的问题,可以通过对client/server端trace进一步跟踪

TRACE_LEVEL_CLIENT = 16
TRACE_FILE_CLIENT = client/server
TRACE_DIRECTORY_CLIENT = [any valid directory path]
TRACE_TIMESTAMP_CLIENT = ON
DIAG_ADR_ENABLED=off
发表在 Oracle 监听 | 标签为 , , | 评论关闭

Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out

客户反馈系统经常报会话超时,导致应用测试无法正常进行,经检查alert日志发现

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for HPUX: Version 11.2.0.4.0 - Production
        Oracle Bequeath NT Protocol Adapter for HPUX: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for HPUX: Version 11.2.0.4.0 - Production
  Time: 29-SEP-2014 20:42:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 238
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.78.199.47)(PORT=55447))
Mon Sep 29 20:42:56 2014

虽然大部分网站或者mos上描述,Fatal NI connect error 12170部分情况考虑使用配置如下配置

##调整listener.ora 
vi $ORACLE_HOME/network/admin/listener.ora
增加:
DIAG_ADR_ENABLED_LISTENER=OFF
INBOUND_CONNECT_TIMEOUT_LISTENER=180

##调整sqlnet.ora  
vi $ORACLE_HOME/network/admin/sqlnet.ora
增加:
DIAG_ADR_ENABLED=OFF
SQLNET.INBOUND_CONNECT_TIMEOUT=180

这些已经配置,但是现在报12170 TNS-12535 TNS-00505错误,通过结合mos发现,出现该问题,可能是由于应用服务器和数据库服务器之间的防火墙策略设置不适合业务查询需求,出现应用服务器和数据库服务器防火墙超时(比如应用服务器发起一个大查询,在数据库服务器中执行,尚未返回结果,可是网络已经超时,终止会话)
补充知识点

The 'nt secondary err code' identifies the underlying network transport, such as (TCP/IP) timeout limit. 
In the current case 60 identifies Windows underlying transport layer.

The "nt secondary err code" will be different based on the operating system:

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX : "nt secondary err code: 238"
AIX: "nt secondary err code: 78"
Solaris: "nt secondary err code: 145"


The alert.log message indicates that a connection was terminated AFTER it was established to the instance.  
In this case, it was terminated 2 hours and 3 minutes after the listener handed the connection to the database. 

 This would indicate an issue with a firewall where a maximum idle time setting is in place. 

The connection would not necessarily be "idle".  This issue can arise during a long running query
or when using JDBC Thin connection pooling. If there is no data 'on the wire' for lengthy

periods of time for any reason, the firewall might terminate the connection.

解决方案

The non-Oracle solution would be to remove or increase the firewall setting for maximum idle time.  
In cases where this is not feasible, Oracle offers the following suggestion:

The following parameter, set at the **RDBMS_HOME/network/admin/sqlnet.ora, can resolve this kind of problem.  
DCD or SQLNET.EXPIRE_TIME can mimic data transmission between the server and the client during long periods of idle time.

SQLNET.EXPIRE_TIME=n  Where <n> is a non-zero value set in minutes.  

See the following : Note 257650.1 Resolving Problems with Connection Idle Timeout With Firewall

当然除下面数据库中解决外,还可以在网络防火墙层面解决,比如增加网络空闲终止时间等

具体参考:Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (Doc ID 1628949.1)
Fatal NI Connect Error 12170, ‘TNS-12535: TNS:operation timed out’ Reported in 11g Alert Log (Doc ID 1286376.1)

发表在 Oracle 监听 | 标签为 , , | 评论关闭