月归档:二月 2012

Query Duration=0与ORA-01555

1.ALERT日志错误
奇怪之处:Query Duration=0 sec,竟然出现了ORA-01555

Tue Feb  7 02:41:34 2012
ORA-01555 caused by SQL statement below (Query Duration=0 sec, SCN: 0x0b2e.efcd78a9):
Tue Feb  7 02:41:34 2012
SELECT "ID_NO","CUST_ID" FROM "DBACCADM"."DCUSTMSG" "C" WHERE "ID_NO"=:1

2.ORA-01555解释
超过了undo_retention时间,undo被覆盖导致ORA-01555

[zwq_acc1:/home/oraeye/check]oerr ora 1555
01555, 00000, "snapshot too old: rollback segment number %s with name \"%s\" too small"
// *Cause: rollback records needed by a reader for consistent read are
//         overwritten by other writers
// *Action: If in Automatic Undo Management mode, increase undo_retention
//          setting. Otherwise, use larger rollback segments

3.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

4.undo基本信息
从这里可以发现,两个节点的undo表空间还有很多剩余,缺发生了undo被覆盖从而出现了ORA-01555

SQL> col name for a20
SQL> col value for a15
SQL> SELECT INST_ID, NAME, VALUE
  2    FROM GV$PARAMETER
  3   WHERE UPPER (Name) LIKE '%UNDO%';

   INST_ID NAME                 VALUE
---------- -------------------- ---------------
         1 undo_management      AUTO
         1 undo_tablespace      UNDOTBS1
         1 undo_suppress_errors FALSE
         1 undo_retention       1800
         2 undo_management      AUTO
         2 undo_tablespace      UNDOTBS2
         2 undo_suppress_errors FALSE
         2 undo_retention       1800

8 rows selected.


TABLESPACE_NAME                CURRENT_TOTAL(MB)   USED(MB)   FREE(MB)      FREE% AUT MAX_TOTAL(MB)
------------------------------ ----------------- ---------- ---------- ---------- --- -------------
UNDOTBS1                                   40950    1587.94 39362.0625      96.12 NO          40950
UNDOTBS2                                   57330    1926.31 55403.6875      96.64 NO          57330

SQL> SELECT DISTINCT STATUS ,
  2                  COUNT(*) "EXTENT_NUM",
  3                  SUM(BYTES) / 1024 / 1024 / 1024 "UNDO(G)"
  4    FROM DBA_UNDO_EXTENTS
  5   GROUP BY STATUS;

STATUS    EXTENT_NUM    UNDO(G)
--------- ---------- ----------
ACTIVE           208 .273658752
EXPIRED         7651 2.42865753
UNEXPIRED        941 .752548218

查询MOS[ID 761128.1],发现可能是Oracle bug导致(BUG:6799685 – ORA-1555 ERROR WITH QUERY DURATION=0 AND UNDO_RETENTION=1800和BUG:5475085 – V$UNDOSTAT.EXPBLKREUCNT IS NEVER INCREMENTED)

5.解决方法
Increase the size of the UNDO tablespace and increase the UNDO_RETENTION parameter value to try to prevent required undo expiring too quickly.
基于本库,因为undo空间还有很大剩余,直接设置UNDO_RETENTION=3600即可(可以从一定程度上缓解整个问题,但是要从根本上解决整个问题,需要升级到10.2.0.4及其以上版本)

发表在 ORA-xxxxx | 标签为 | 2 条评论

StatSpack报告中redo size为负数

在一份statspack报告中发现redo size 为负数

DB Name         DB Id    Instance     Inst Num Release     Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
CRM           3413823439 crm2                2 9.2.0.8.0   YES     zwq_crm2

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     47654 05-Feb-12 11:00:04    2,301      20.0
  End Snap:     47655 05-Feb-12 12:00:02    2,298      20.3
   Elapsed:               59.97 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:    36,448M      Std Block Size:          8K
           Shared Pool Size:    10,240M          Log Buffer:     20,480K

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
  负数=====>      Redo size:           -359,121.01             -7,828.69
              Logical reads:            349,787.58              7,625.22
              Block changes:              5,656.13                123.30
             Physical reads:             12,521.51                272.96
            Physical writes:                427.44                  9.32
                 User calls:             22,223.63                484.47
                     Parses:              4,673.27                101.88
                Hard parses:                 46.78                  1.02
                      Sorts:              4,027.70                 87.80
                     Logons:                 10.57                  0.23
                   Executes:             10,480.35                228.47
               Transactions:                 45.87

一时之间感觉很奇怪,在运行的数据库redo size不可能为负数(只要数据库在运行redo size都应该大于0).查询MOS[ID 308414.1]发现原来是一个bug引起(Bug:1713403 NEGATIVE VALUE IN V$SESSTAT FOR STATISTIC REDO SIZE),sp中的redo size其本质还是来源于V$SESSTAT.

Applies to:
Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 9.2.0.8
This problem can occur on any platform.

Symptoms
Redo Size is displayed as a Negative number in a Statspack report.

For example:

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
             --------------- ---------------
Redo size:        -17,931.33      -55,277.73
Logical reads:     31,095.80       95,860.43
...


Cause
Bug:1713403 NEGATIVE VALUE IN V$SESSTAT FOR STATISTIC REDO SIZE
Overflow of 'redo size' statistic. This is fixed in Oracle10g and above.
Patches do not exist for earlier releases.

Solution
Use 'redo blocks written' instead to measure the amount of redo. 
Ignore negative 'redo size'.
发表在 Oracle | 评论关闭

shell处理alert日志

每天都检查oracle日志,所以写了一个比较完善的shell,让其自动处理,在运行程序之前,需要在该脚本目录下新建tmp目录

#!/usr/bin/ksh
export LANG=en

#临时目录
tmp_dest=$HOME/check/tmp
#用户名
username=username
#密码
password=password

cd $tmp_dest
sqlplus -s $username/$password<<XFF>/dev/null
set echo on
spool sqlplus.txt
col name format a20
col value format a55
select name,value from v\$parameter where name='background_dump_dest';
spool off
exit
XFF

alert_path_num=$(grep -n "background_dump_dest" $tmp_dest/sqlplus.txt |awk -F":" '{print $1}')
alert_path=$(cat $tmp_dest/sqlplus.txt |sed -n "${alert_path_num}p" | awk -F" " '{print $2}')

first_day=`cat $tmp_dest/first_day.tmp`
d_day=$(date +%e)
if [ $d_day -lt 10 ]
then
###########注意:以下两种方式选择其一###################
#部分系统出现alert日志例如:Tue Aug  7 07:44:59 2012
 last_day=$(date +%a)' '$(date +%b)'.*'$d_day'.*'$(date +%Y)

#部分系统出现alert日志例如:Thu Jun 07 13:56:18 2012
 n_day=`echo $d_day | awk 'gsub(/^ *| *$/,"")'`
 last_day=$(date +%a)' '$(date +%b)'.*0'$n_day'.*'$(date +%Y)

else
 last_day=$(date|cut -c 1-10).*$(date +%Y)
fi
echo $last_day > $tmp_dest/first_day.tmp
first_num=$(grep -n "$first_day" $alert_path/alert_$ORACLE_SID.log |head -1|awk -F":" '{print $1}')
if [ -z "$first_num" ]
then
   first_num=1
fi

#修改$last_day为'.*',表示alert日志结尾
last_num=$(grep -n ".*" $alert_path/alert_$ORACLE_SID.log |tail -1|awk -F":" '{print $1}')

point=1
export=$point

echo "########################## checking alert_log start $first_day ########################## "

sed -n "${first_num},${last_num}p" $alert_path/alert_$ORACLE_SID.log > $tmp_dest/trunc_alert
cat $tmp_dest/trunc_alert|grep ORA-|while read line
do
   line=$(echo "$line"|sed -e 's/\[/\\[/g;s/\]/\\]/g')
   time=$(grep -n "$line" $tmp_dest/trunc_alert | awk -F':' '{print $1}'|wc -l)

if [ "$time" -ge 1 ]
then
   num=$(grep -n "$line" $tmp_dest/trunc_alert|awk -F':' '{print $1}'|tail -1)
   #echo $num
   front_num=$((num-1))
   back_num=$((num+9))

   echo "++++++++++++++++++$point++++++++++++++++++++"
    sed -n "${front_num},${back_num}p" $tmp_dest/trunc_alert
   echo "++++++++++++++++++$point-End++++++++++++++++"
   point=$((point+1))

   flag=1
else
     if [ -z "$time" ]
     then
     flag=0
     fi
fi
done

#rm $tmp_dest/trunc_alert
#rm $tmp_dest/sqlplus.txt

if [ " $flag " -eq 0 ]
   then
   echo "No errors in $first_day !"
fi
echo "########################## checking alert_log end $last_day ########################## "

检查结果

########################## checking alert_log start Fri Jan 27.*2012 ########################## 
++++++++++++++++++1++++++++++++++++++++
  Current log# 3 seq# 918 mem# 0: /opt/oracle/oradata/orcl/mcrm/redo03.log
Tue Jan 31 22:00:22 2012
Errors in file /opt/oracle/admin/mcrm/bdump/mcrm_j001_23329.trc:
ORA-01114: IO error writing block to file 201 (block # 550944)
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 550944
Additional information: -1
ORA-01114: IO error writing block to file 201 (block # 550922)
ORA-27072: File I/O error
Additional information: 4
Additional information: 550922
++++++++++++++++++1-End++++++++++++++++
++++++++++++++++++2++++++++++++++++++++
Additional information: 550944
Additional information: -1
ORA-01114: IO error writing block to file 201 (block # 550922)
ORA-27072: File I/O error
Additional information: 4
Additional information: 550922
Additional information: 114688
Wed Feb  1 07:24:02 2012
Thread 1 advanced to log sequence 919 (LGWR switch)
  Current log# 1 seq# 919 mem# 0: /opt/oracle/oradata/orcl/mcrm/redo01.log
Wed Feb  1 07:24:02 2012
Errors in file /opt/oracle/admin/mcrm/bdump/mcrm_arc1_3624.trc:
ORA-19504: failed to create file "/opt/oracle/oradata/archivelog/1_918_741262746.dbf"
++++++++++++++++++2-End++++++++++++++++
########################## checking alert_log end Sun Feb 12.*2012########################## 
发表在 Linux | 评论关闭