月归档:九月 2014

MOS又一次不靠谱—ORA-27163: out of memory

数据库版本

oracle -> 11g @xifenfei:/home/oracle$sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Sep 30 10:28:30 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for HPUX: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

数据库补丁信息

oracle -> 11g @xifenfei:/home/oracle$opatch lspatches
18973907;
18795105;
18701707;
18284357;
18020394;
17564992;
17308789;
17306264;
17259786;
17079301;
16477664;
16188701;
14368995;
14245531;
11744544;
18522515;OCW Patch Set Update : 11.2.0.4.3 (18522515)
18522509;Database Patch Set Update : 11.2.0.4.3 (18522509)

收集spa报告报ORA-27163: out of memory错误

SQL> SELECT dbms_sqlpa.report_analysis_task('SPA_TEST', 'HTML', 'errors','ALL') FROM dual;

ERROR:
ORA-27163: out of memory
ORA-06512: at "SYS.DBMS_SQLTUNE_INTERNAL", line 8211
ORA-06512: at "SYS.DBMS_SQLPA", line 515
ORA-06512: at line 1


no rows selected

设置event后收集

Connected.
SQL> alter session set events '31156 trace name context forever, level 0x400';

Session altered.

SQL> SELECT dbms_sqlpa.report_analysis_task('SPA_TEST', 'HTML', 'errors','ALL') FROM dual;

DBMS_SQLPA.REPORT_ANALYSIS_TASK('SPA_TEST','HTML','ERRORS','ALL')
--------------------------------------------------------------------------------
<html>
    <head>
        <title>
   SQL Performance Impact Analyzer Report

查询MOS发现XML Parser Fails With ORA-27163 (Out Of Memory) (Doc ID 1599434.1),相关描述:
bug-xml
按照文档描述,该问题在11.2.0.4中已经修复,可是在hp unix中依然存在该问题,经验告诉我们,现在的MOS不能完全相信

发表在 Oracle | 标签为 | 评论关闭

Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out

客户反馈系统经常报会话超时,导致应用测试无法正常进行,经检查alert日志发现

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for HPUX: Version 11.2.0.4.0 - Production
        Oracle Bequeath NT Protocol Adapter for HPUX: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for HPUX: Version 11.2.0.4.0 - Production
  Time: 29-SEP-2014 20:42:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 238
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.78.199.47)(PORT=55447))
Mon Sep 29 20:42:56 2014

虽然大部分网站或者mos上描述,Fatal NI connect error 12170部分情况考虑使用配置如下配置

##调整listener.ora 
vi $ORACLE_HOME/network/admin/listener.ora
增加:
DIAG_ADR_ENABLED_LISTENER=OFF
INBOUND_CONNECT_TIMEOUT_LISTENER=180

##调整sqlnet.ora  
vi $ORACLE_HOME/network/admin/sqlnet.ora
增加:
DIAG_ADR_ENABLED=OFF
SQLNET.INBOUND_CONNECT_TIMEOUT=180

这些已经配置,但是现在报12170 TNS-12535 TNS-00505错误,通过结合mos发现,出现该问题,可能是由于应用服务器和数据库服务器之间的防火墙策略设置不适合业务查询需求,出现应用服务器和数据库服务器防火墙超时(比如应用服务器发起一个大查询,在数据库服务器中执行,尚未返回结果,可是网络已经超时,终止会话)
补充知识点

The 'nt secondary err code' identifies the underlying network transport, such as (TCP/IP) timeout limit. 
In the current case 60 identifies Windows underlying transport layer.

The "nt secondary err code" will be different based on the operating system:

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX : "nt secondary err code: 238"
AIX: "nt secondary err code: 78"
Solaris: "nt secondary err code: 145"


The alert.log message indicates that a connection was terminated AFTER it was established to the instance.  
In this case, it was terminated 2 hours and 3 minutes after the listener handed the connection to the database. 

 This would indicate an issue with a firewall where a maximum idle time setting is in place. 

The connection would not necessarily be "idle".  This issue can arise during a long running query
or when using JDBC Thin connection pooling. If there is no data 'on the wire' for lengthy

periods of time for any reason, the firewall might terminate the connection.

解决方案

The non-Oracle solution would be to remove or increase the firewall setting for maximum idle time.  
In cases where this is not feasible, Oracle offers the following suggestion:

The following parameter, set at the **RDBMS_HOME/network/admin/sqlnet.ora, can resolve this kind of problem.  
DCD or SQLNET.EXPIRE_TIME can mimic data transmission between the server and the client during long periods of idle time.

SQLNET.EXPIRE_TIME=n  Where <n> is a non-zero value set in minutes.  

See the following : Note 257650.1 Resolving Problems with Connection Idle Timeout With Firewall

当然除下面数据库中解决外,还可以在网络防火墙层面解决,比如增加网络空闲终止时间等

具体参考:Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (Doc ID 1628949.1)
Fatal NI Connect Error 12170, ‘TNS-12535: TNS:operation timed out’ Reported in 11g Alert Log (Doc ID 1286376.1)

发表在 Oracle 监听 | 标签为 , , | 评论关闭

ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

在昨天11.2.0.2 for Linux 数据库恢复过程中,把数据文件从asm复制到单节点机器中恢复,在resetlogs过程中报如下ORA-38856错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

ORA-38856 is the expected error during open database resetlogs when the set of enabled instances (redo threads) in the controlfile does not match the set of enabled instances (redo threads) in datafile checkpoint. This is expected behavior in a normal RAC restore/recover/open resetlogs situation.
这句话的意思是:数据库在resetlogs的时候发现控制文件中的redo threads和数据文件汇总的redo threads不一致,从而出现该问题.
在本次恢复中禁用了所有和thread 2相关参数,数据库依然报告错误,是因为数据库在异常恢复过程中需要读取节点2的redo信息,现在无法读取从而出现该错误.但是使用了_allow_resetlogs_corruption 之后还是报该错误,实在诡异.通过查询mos发现有类似Unpublished Bug 4355382 ORA-38856: FAILED TO OPEN DATABASE WITH RESETLOGS WHEN USING RAC BACKUP,虽然说该bug在10.2.0.3中修复,但是在异常恢复过程中,本着在风险可控的情况下,大胆尝试,继续使用_no_recovery_through_resetlogs,数据库正常resetlogs成功.
可以参考:RMAN Duplicate from RAC backup fails ORA-38856 (Doc ID 334899.1)

发表在 ORA-xxxxx | 标签为 , | 2 条评论