分类目录归档:ORA-xxxxx

ORA-600 kcratr_nab_less_than_odr故障解决

朋友的数据库服务器出现ORA-00600[kcratr_nab_less_than_odr],不能open数据库
1.open数据库报ORA-00600[kcratr_nab_less_than_odr]

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

2.查看alert日志

Wed Jan 11 13:56:16 2012
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Started redo scan
Completed redo scan
 read 54591 KB redo, 0 data blocks need recovery
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc  (incident=818557):
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Incident details in: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_818557\dbdms_ora_3936_i818557.trc
Aborting crash recovery due to error 600
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Trace dumping is performing id=[cdmp_20120110214555]

3.查看trace文件

Trace file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 2 - type 8664, 2 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:2250M/4060M, Ph+PgF:5868M/8119M 
Instance name: dbdms
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 3108, image: ORACLE.EXE (SHAD)
…………………………
WARNING! Crash recovery of thread 1 seq 99189 is
ending at redo block 43531 but should not have ended before
redo block 43569
Incident 826550 created, dump file: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_826550\dbdms_ora_3108_i826550.trc
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

通过alert和trace中的内容可以知道,数据库需要恢复到rba到43569,但是因为某种原因实例恢复的时候,只能利用1 thread 99189 seq#,恢复rba到43531。从而导致数据库无法正常open

This Problem is caused by Storage Problem of the Database Files. 
The Subsystem (eg. SAN) crashed while the Database was open. 
The Database then crashed since the Database Files were not accessible anymore. 
This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.

4.解决方法

SQL> SELECT STATUS FROM V$INSTANCE;

STATUS
------------
MOUNTED

--尝试直接recover database
SQL> RECOVER DATABASE ;
ORA-00283: 恢复会话因错误而取消
ORA-00264: 不要求恢复
--提示不用恢复

--再打开数据库,还是kcratr_nab_less_than_odr错误警告
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

--尝试不完全恢复
SQL> RECOVER DATABASE UNTIL CANCEL;
ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01152: 文件 1 没有从过旧的备份中还原
ORA-01110: 数据文件 1: 'D:\DBDMS\DATA\SYSTEM01.DBF'

--重建控制文件
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS 'D:/1.TXT';

数据库已更改。

SQL> SHUTDOWN IMMEDIATE;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT;
ORACLE 例程已经启动。

Total System Global Area  417546240 bytes
Fixed Size                  2176328 bytes
Variable Size             268438200 bytes
Database Buffers          138412032 bytes
Redo Buffers                8519680 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "DBDMS" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 18688
  7  LOGFILE
  8    GROUP 1 'D:\DBDMS\LOG\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'D:\DBDMS\LOG\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'D:\DBDMS\LOG\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    'D:\DBDMS\DATA\SYSTEM01.DBF',
 13    'D:\DBDMS\DATA\SYSAUX01.DBF',
 14    'D:\DBDMS\DATA\RBSG01.DBF',
 15    'D:\DBDMS\DATA\DATA01.DBF',
 16    'D:\DBDMS\DATA\INDX01.DBF',
 17    'D:\DBDMS\DATA\DATA02.DBF',
 18    'D:\DBDMS\DATA\DATA03.DBF',
 19    'D:\DBDMS\DATA\DATA04.DBF',
 20    'D:\DBDMS\DATA\INDX02.DBF',
 21    'D:\DBDMS\DATA\SYSTEM02.DBF'
 22  CHARACTER SET ZHS16GBK
 23  ;

控制文件已创建。

--继续尝试恢复
SQL> RECOVER DATABASE ;
完成介质恢复。
SQL> ALTER DATABASE OPEN;

数据库已更改。
--open成功

在这次恢复中,主要就是重建控制文件,然后直接恢复成功,如果redo有损坏,那么可能需要使用不完全恢复,然后使用resetlogs打开数据库

发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 , , | 7 条评论

ORA-01555 caused by SQL statement below

一.发现ORA-01555

Mon Dec 26 10:08:22 2011
ORA-01555 caused by SQL statement below (Query Duration=49146 sec, SCN: 0x0b4b.17f5ae42):
Mon Dec 26 10:08:22 2011
SELECT COMPANY_ID,
       COMPANY_MOBILE,
       TO_CHAR(NVL(REG_DATE, SYSDATE - 100), 'yyyymmddhh24miss'),
       BAK_FIELD2
  FROM TAB_XN_COMPANY
 WHERE (COMPANY_STATUS = 1 OR
       (COMPANY_STATUS = 3 AND
       NVL(UNREG_DATE, SYSDATE + 100) >=
       TO_DATE('20111226094500', 'yyyymmddhh24miss')))
   AND NVL(REG_DATE, SYSDATE - 100) <=
       TO_DATE('20111226095959', 'yyyymmddhh24miss')
   AND PAY_TYPE > 0

二.数据库状态

[oracle@ora02 ~]$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Wed Jan 4 10:48:17 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> show parameter undo;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     10800
undo_suppress_errors                 boolean     FALSE
undo_tablespace                      string      UNDOTBS1

SQL> select sum(maxbytes)/1024/1024/1024,
   2 SUM(USER_BYTES)/1024/1024/1024 FROM dba_data_files where tablespace_NAME='UNDOTBS1';

SUM(MAXBYTES)/1024/1024/1024 SUM(USER_BYTES)/1024/1024/1024
---------------------------- ------------------------------
                  61.9999847                     32.6834106

SQL> SELECT DISTINCT STATUS "状态",
  2                  COUNT(*) "EXTENT数量",
  3                  SUM(BYTES) / 1024 / 1024 / 1024 "UNDO大小"
  4    FROM DBA_UNDO_EXTENTS
  5   GROUP BY STATUS;

状态      EXTENT数量   UNDO大小
--------- ---------- ----------
ACTIVE             1 .000976563
EXPIRED         2549 31.2333298
UNEXPIRED          3 .000175476

通过undo_retention保留时间为10800秒,而该sql执行了49146秒,在这49146秒钟,TAB_XN_COMPANY表中的数据被修改,而且被修改的undo数据在10800秒后被覆盖导致,导致原查询语句不能获取到scn小于或者等于查询时候的数据块内容(在undo中),所以出现ORA-01555。从这里也可以看出来,在undo空间还剩余的情况下,如果超过了undo_retention限制,undo内容还是有可能被覆盖,而不是使用未使用的undo

三.出现ORA-1555原因
The ORA-1555 errors can happen when a query is unable to access enough undo to build a copy of the data at the time the query started. Committed “versions” of blocks are maintained along with newer uncommitted “versions” of those blocks so that queries can access data as it existed in the database at the time of the query. These are referred to as “consistent read” blocks and are maintained using Oracle undo management.
就是一个查询要访问某个数据块,而这个数据块在这个查询执行过程中修改过,那么该查询需要查询undo中数据块,而undo中该数据块已经不存在,从而出现ORA-1555

四.ORA-1555解决方法
Case 1 – Rollback Overwritten
1.缩短sql运行时间
2.增加undo_retention,这个同时需要考虑undo空间大小
3.减少commit(rollback)次数
4.在一条sql中尽量使数据块访问一次
4.1)Using a full table scan rather than an index lookup
4.2)Introducing a dummy sort so that we retrieve all the data, sort it and then sequentially visit these data blocks.

Case 2 – Rollback Transaction Slot Overwritten
这种问题,主要是延迟块清理导致,一般建议在进行大批量的dml操作后,使用全表(全index)扫描执行一遍,或者收集全部统计信息

发表在 ORA-xxxxx | 标签为 | 评论关闭

ORA-07445[kslgetl()+120]/ORA-00108错误解决

一.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

二.alert中发现ORA-07445[kslgetl()+120]/ORA-00108错误

Mon Dec 19 09:19:42 2011
found dead dispatcher 'D000', pid = (13, 1)
Mon Dec 19 09:19:42 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:42 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16297.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 19 09:19:45 2011
found dead dispatcher 'D000', pid = (21, 2)
Mon Dec 19 09:19:45 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:45 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16299.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously

三.trace文件信息

Oracle process number: 15
Unix process pid: 10607, image: oracle@gongantest (D000)

Warning: keltnfy call to ldmInit failed with error 46
*** 2011-12-19 19:21:40.100
network error encountered getting listening address:
  NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
  NS Secondary Error: TNS-12560: TNS:protocol adapter error
  NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x208, PC: [0x7a06b8, kslgetl()+120]
*** 2011-12-19 19:21:40.107
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ssexhd()+629         call     ksedmp()             000000003 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
__restore_rt()+0     call     ssexhd()             00000000B ? 2B91BE8F8D70 ?
                                                   2B91BE8F8C40 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
kslgetl()+120        signal   __restore_rt()       0600E7560 ? 0000000E8 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?
ksfglt()+108         call     kslgetl()            0600E7560 ? 000000001 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?

四.在MOS上找到相关文章[ID 1298804.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7 - Release: 11.1 to 11.1
Information in this document applies to any platform.
Symptoms

The following errors are seen in the trace file written by an ORA-7445 [kslgetl]:

network error encountered getting listening address:
NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
NS Secondary Error: TNS-12560: TNS:protocol adapter error
NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x130, PC: [0x82f09dc, kslgetl()+80]

The trace file indicates that there is no session:

Current SQL information unavailable - no session.

The Call Stack Trace in the ORA-7445 trace file contains a function list similar to:

 kslgetl <- PGOSF57_ksfglt
<- kghfre <- kmnsbf <- nsbfr <- nsiofrrg <- nsiocancel
<- nsopen_shutitdown <- nsclose <- nsgblclose <- nsgblTRMHelper <- nsgblRealTerm
<- nlstdstp <- npinlt <- ksuabt <- opidrv <- sou2o
<- opimai_real <- main <- libc_start_main

Cause

The trace file first reports: Warning: keltnfy call to ldmInit failed with error 46 

The ORA-7445 is not the starting point here. This exception is just a spin-off from ORA-180 and it is possible that different internal errors may be seen, such as ORA-600 [504], depending on what is happening when the ORA-180 is encountered.

The cause for the ORA-180 is related to the inital message at the beginning of the trace file: "keltnfy call to ldmInit failed with error 46" and this is followed by: "network error encountered getting listening address:" 

The error code (here: 46) is the key for solving the issue.
This warning says that ldmInit() returned error 46 which is LDMERR_HOST_NOT_FOUND (host not found).

This error is returned if the OS call gethostbyname() fails with an error. So these appears to be a network specific issue.
Solution

1) Check permission on /etc/hosts

$ ls -l /etc/hosts -rw-r--r-- 2 root root 194 Oct 17 2006 /etc/hosts


Check if /etc/hosts file is correctly configured


<ip address> <fully qualified hostname> <simple or short hostname> <alias, if applicable> ( all of this on one line ).


2) Check the hostname:

$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname


3) Check if /etc/nodename is correctly configured
If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.


$ nslookup <shortname> $ nslookup <long name> $ nslookup <ip address>

The forward and reverse lookup should succeed and return consistent address/info.


4) Check nsswitch.conf


$ more nsswitch.confhosts: files dnsMake sure host lookup is also done through the /etc/hosts file and not just dns. It is recommended that FILES come first before DNS.


Also, check the resolv.conf. This makes sure that the DNS is working properly.

在这里看到虽然数据库的版本不一样,alert和trace中的错误不完全一致,但是很相似。是由于主机名不能被正常访问导致,所以尝试这从主机名相关部分着手解决。

五.查看主机名相关信息,解决问题

[oracle@gongantest ~]$  ls -l /etc/hosts
-rw-r--r-- 2 root root 176 Dec 16 13:43 /etc/hosts
[oracle@gongantest ~]$ hostname
gongantest
[oracle@gongantest ~]$ ping gongantest
ping: unknown host gongantest
[oracle@gongantest ~]$ more  /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
#192.168.11.60    gongantest
--发现改主机名的dns被注释掉
[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.11.60) 56(84) bytes of data.
From 192.168.9.66 icmp_seq=2 Destination Host Unreachable
From 192.168.9.66 icmp_seq=3 Destination Host Unreachable
From 192.168.9.66 icmp_seq=4 Destination Host Unreachable

--- gongantest ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3999ms, pipe 3
--除掉注释,测试不通
[root@gongantest ~]# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:14:22:10:96:CA  
          inet addr:192.168.9.66  Bcast:192.168.11.255  Mask:255.255.252.0
          inet6 addr: fe80::214:22ff:fe10:96ca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4207222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2482964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1156547557 (1.0 GiB)  TX bytes:565900103 (539.6 MiB)
--发现该机器的ip为192.168.9.66,修改hosts文件。
--初步确定出现问题的原因是因为机器迁移,使用了新ip,注释掉了hosts文件中错误ip
[root@gongantest ~]# vi /etc/hosts

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
192.168.9.66    gongantest

[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.9.66) 56(84) bytes of data.
64 bytes from gongantest (192.168.9.66): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=2 ttl=64 time=0.031 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=3 ttl=64 time=0.033 ms

--- gongantest ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.031/0.033/0.037/0.007 ms
--ping测试正常

继续观察

--观察d000进程是否启动
[oracle@gongantest ~]$ ps -ef|grep ora_
oracle   10180     1  0 13:23 ?        00:00:00 ora_pmon_gaxt
oracle   10182     1  0 13:23 ?        00:00:00 ora_psp0_gaxt
oracle   10184     1  0 13:23 ?        00:00:00 ora_mman_gaxt
oracle   10186     1  0 13:23 ?        00:00:00 ora_dbw0_gaxt
oracle   10188     1  0 13:23 ?        00:00:02 ora_lgwr_gaxt
oracle   10190     1  0 13:23 ?        00:00:00 ora_ckpt_gaxt
oracle   10192     1  0 13:23 ?        00:00:00 ora_smon_gaxt
oracle   10194     1  0 13:23 ?        00:00:00 ora_reco_gaxt
oracle   10196     1  0 13:23 ?        00:00:00 ora_cjq0_gaxt
oracle   10198     1  0 13:23 ?        00:00:00 ora_mmon_gaxt
oracle   10200     1  0 13:23 ?        00:00:00 ora_mmnl_gaxt
oracle   10204     1  0 13:23 ?        00:00:00 ora_s000_gaxt
oracle   10210     1  0 13:23 ?        00:00:00 ora_arc0_gaxt
oracle   10212     1  0 13:23 ?        00:00:00 ora_arc1_gaxt
oracle   10214     1  0 13:23 ?        00:00:00 ora_qmnc_gaxt
oracle   10218     1  0 13:23 ?        00:00:00 ora_j000_gaxt
oracle   10222     1  0 13:24 ?        00:00:00 ora_q000_gaxt
oracle   10234     1  0 13:24 ?        00:00:00 ora_q001_gaxt
oracle   10609     1  0 13:32 ?        00:00:00 ora_d000_gaxt
oracle   10639  9962  0 13:35 pts/0    00:00:00 grep ora_

--观察alert日志未出现该错误
Mon Dec 26 13:32:18 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_10607.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 26 13:32:21 2011
found dead dispatcher 'D000', pid = (13, 85)
Mon Dec 26 13:36:04 2011
Thread 1 advanced to log sequence 232 (LGWR switch)
  Current log# 1 seq# 232 mem# 0: /opt/oracle/oradata/gaxt/redo01.log

至此ORA-07445[kslgetl()+120]/ORA-00108错误解决

发表在 ORA-xxxxx | 标签为 , | 评论关闭