月归档:一月 2012

RAC中关于”Immediate Kill Session#” bug记录

今天在rac的一个节点上发现很多Immediate Kill Session#的错误,分析记录如下
1.alert日志内容

Sun Jan  1 02:12:28 2012
ALTER SYSTEM SET service_names='' SCOPE=MEMORY SID='ora9i1';
Sun Jan  1 02:12:28 2012
Immediate Kill Session#: 496, Serial#: 51199
Immediate Kill Session: sess: 0x406bfa26b78  OS pid: 12900
Immediate Kill Session#: 497, Serial#: 38504
Immediate Kill Session: sess: 0x406bfa280e0  OS pid: 12496
Immediate Kill Session#: 499, Serial#: 45296
Immediate Kill Session: sess: 0x406bfa2abb0  OS pid: 12467
Immediate Kill Session#: 502, Serial#: 18910
Immediate Kill Session: sess: 0x406bfa2ebe8  OS pid: 28887
Immediate Kill Session#: 503, Serial#: 26631
Immediate Kill Session: sess: 0x406bfa30150  OS pid: 20749
Immediate Kill Session#: 508, Serial#: 63586
Immediate Kill Session: sess: 0x406bfa36c58  OS pid: 27614
Immediate Kill Session#: 512, Serial#: 43388
Immediate Kill Session: sess: 0x406bfa3c1f8  OS pid: 4021
Immediate Kill Session#: 516, Serial#: 33975
Immediate Kill Session: sess: 0x406bfa41798  OS pid: 18481
Immediate Kill Session#: 517, Serial#: 24240
Immediate Kill Session: sess: 0x406bfa42d00  OS pid: 823
Immediate Kill Session#: 526, Serial#: 59767
Immediate Kill Session: sess: 0x406bfa4eda8  OS pid: 12529
Immediate Kill Session#: 527, Serial#: 45765
Immediate Kill Session: sess: 0x406bfa50310  OS pid: 6059
……………………
Sun Jan  1 02:22:29 2012
ALTER SYSTEM SET service_names='ora9i' SCOPE=MEMORY SID='ora9i1';

2.数据库配置
2.1)A节点相关配置

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
ora9i1

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux IA64: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> show parameter name;

NAME                                 TYPE       VALUE
------------------------------------ ---------- --------------------
db_file_name_convert                 string
db_name                              string     ora9i
db_unique_name                       string     ora9i
global_names                         boolean    FALSE
instance_name                        string     ora9i1
lock_name_space                      string
log_file_name_convert                string
service_names                        string     ora9i

2.2)B节点相关配置

SQL>  select instance_name from v$instance;

INSTANCE_NAME
----------------
ora9i2

SQL>  select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux IA64: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> show parameter name;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_name_convert                 string
db_name                              string      ora9i
db_unique_name                       string      ora9i
global_names                         boolean     FALSE
instance_name                        string      ora9i2
lock_name_space                      string
log_file_name_convert                string
service_names                        string      SYS$SYS.KUPC$C_2_2012010601100
                                                 6.ORA9I, ora9i, SYS$SYS.KUPC$S
                                                 _2_20120106011006.ORA9I

3.查看MOS,寻找解决方案
3.1)产生该问题原因

This is caused by unpublished Bug 6955040 ALL THE SESSIONS LOST CONNECTION AFTER KILLING CRSD.BIN.

The problem is when CRSD is killed or crashed and restarted, 
CRSD will run resource check action but CRS resource status will not be available at that time. 
Then in instance check action, 
it fails to get the preferred node VIP resource status and considered the preferred node VIP resource is not running. 
Therefore, instance check action will remove the default database service name 
and disconnect sessions connected using default database service name.

This causes messages "ALTER SYSTEM" and "Immediate Kill Session" printed in alert log.

3.2)解决方案

1) The fix is included in 10.2.0.5 patchset and 11.1.0.7 patchset.
    Apply the patchset once they are available.

OR

2) Configure a service name other than the default one (same as db_name), 
and get user to use the non-default service name for connection.
发表在 Oracle RAC | 评论关闭

记录AUTO_SPACE_ADVISOR_JOB导致负载异常

早上上班,检查数据库,发现监控日志中在晚上1点到4点钟服务器异常负载现象,查看awr日志发现AUTO_SPACE_ADVISOR_JOB运行异常
0.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

1.资源使用情况
cpu情况

逻辑读情况

物理读情况

通过这些可以看出,在晚上的时间点,AUTO_SPACE_ADVISOR_JOB占用了系统的资源的大部分,导致系统负载过高,也许是出现告警的主要原因。

2.查看运行情况

SQL> col status FOR A10
SQL> COL RUN_DURATION FOR A20
SQL> COL start_date FOR A20
SQL> COL log_date FOR A20
SQL> SELECT status,TO_CHAR(ACTUAL_START_DATE,'YYYY-MM-DD HH24:MI:SS') start_date,
  2  TO_CHAR (log_date, 'YYYY-MM-DD HH24:MI:SS') log_date,RUN_DURATION
  3  FROM dba_scheduler_job_run_details
  4  WHERE job_name = 'AUTO_SPACE_ADVISOR_JOB' order by 3;

STATUS     START_DATE           LOG_DATE             RUN_DURATION
---------- -------------------- -------------------- --------------------
SUCCEEDED  2011-12-31 00:00:02  2011-12-31 00:03:05  +000 00:03:03
SUCCEEDED  2012-01-02 00:00:03  2012-01-02 00:03:05  +000 00:03:03
SUCCEEDED  2012-01-03 00:00:02  2012-01-03 00:02:17  +000 00:02:15
SUCCEEDED  2012-01-04 00:00:02  2012-01-04 00:01:41  +000 00:01:39
SUCCEEDED  2012-01-05 00:01:14  2012-01-05 04:02:05  +000 04:00:51

从这里看出,平时AUTO_SPACE_ADVISOR_JOB运行时间只有3分钟左右的时间就可以结束,昨天异常的运行了4个小时。

3.了解AUTO_SPACE_ADVISOR_JOB作用

SQL> select COMMENTS from dba_scheduler_jobs
  2  where job_name='AUTO_SPACE_ADVISOR_JOB';

COMMENTS
-------------------------------------------
auto space advisor maintenance job

从这里看出,该job的主要作用是是用于segment advisor,如果不使用该功能,可以暂时使用下面语句关闭该job

execute dbms_scheduler.disable('AUTO_SPACE_ADVISOR_JOB');

4.对于AUTO_SPACE_ADVISOR_JOB总结
因为该job的在10.2的某些版本中出现类此bug情况,查看mos发现在10.2.0.4中已经修复,但是我这里因为只是出现了一次,暂时不能定位是bug还是数据库偶尔异常,继续观察,如果再出现类此现象,可以采取临时关闭该job的方式处理。

发表在 Oracle性能优化 | 评论关闭

Liunx系统中Oracle使用HugePages配置

一.系统当前状态

[root@node1 ora11g]#  cat /proc/meminfo | grep -i hugepages 
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
[root@node1 ora11g]# uname -sr
Linux 2.6.18-238.19.1.el5
cat /proc/meminfo|grep PageT
PageTables:      44748 kB

二.计算nr_hugepages值

#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
#
# This script is provided by Doc ID 401749.1 from My Oracle Support 
# http://support.oracle.com

# Welcome text
echo "
This script is provided by Doc ID 401749.1 from My Oracle Support 
(http://support.oracle.com) where it is intended to compute values for 
the recommended HugePages/HugeTLB configuration for the current shared 
memory segments. Before proceeding with the execution please make sure 
that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup 
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m

Press Enter to proceed..."

read

# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`

# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`

# Initialize the counter
NUM_PG=0

# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | awk '{print $5}' | grep "[0-9][0-9]*"`
do
   MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
   if [ $MIN_PG -gt 0 ]; then
      NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
   fi
done

RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`

# An SGA less than 100MB does not make sense
# Bail out if that is the case
if [ $RES_BYTES -lt 100000000 ]; then
   echo "***********"
   echo "** ERROR **"
   echo "***********"
   echo "Sorry! There are not enough total of shared memory segments allocated for 
HugePages configuration. HugePages can only be used for shared memory segments 
that you can list by command:

   # ipcs -m

of a size that can match an Oracle Database SGA. Please make sure that:
 * Oracle Database instance is up and running 
 * Oracle Database 11g Automatic Memory Management (AMM) is not configured"
   exit 1
fi

# Finish with results
case $KERN in
   '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
          echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
   '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
    *) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac

# End

1.Configuring the Script

Create a text file named hugepages_settings.sh
Copy the contents below in the file
Run 
	$ chmod +x hugepages_settings.sh

2.Running the Script

Be sure that all applications that are meant to use HugePage / HugeTLB are running at the time the script is to be run. 
This includes the Oracle RDBMS instances and ASM instances in addition to other applications.
Be sure that you have /bin and /usr/bin in $PATH
Run
	$ ./hugepages_settings.sh

三.修改系统配置

1./etc/sysctl.conf
vm.nr_hugepages = 770

2./etc/security/limits.conf
oracle soft memlock 102400
oracle hard memlock 102400

计算公式为:>=HugePages_Total*Hugepagesize

四.重启系统检查配置是否生效

[root@node1 ~]# cat /proc/meminfo | grep -i hugepages 
HugePages_Total:   770
HugePages_Free:    770
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

五.启动数据库并验证

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Wed Jan 4 16:06:48 2012

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2096632 bytes
Variable Size             486539784 bytes
Database Buffers         1107296256 bytes
Redo Buffers               14680064 bytes
Database mounted.
Database opened.
SQL> exit

[oracle@node1 ~]$ grep Huge /proc/meminfo 
HugePages_Total:   770
HugePages_Free:    591
HugePages_Rsvd:    590
Hugepagesize:     2048 kB
[oracle@node1 ~]$ cat /proc/meminfo|grep PageT
PageTables:      13216 kB

HugePages_Free<>HugePages_Total,PageTables变小
证明HugePages配置成功

六.补充说明
1.在数据库内存服务器内存比较大的时候,可以考虑使用该功能,提高系统性能
2.对于Oracl 11g,如果需要使用此功能,需要关闭AMM功能(需要衡量)

发表在 Oracle | 3 条评论