月归档:十一月 2011

ERROR: Error in Log_event::read_log_event(): ‘Found invalid event in binary log’

今天晚上接到现场电话,说mysql数据库中异常丢失数据,我登陆系统使用mysqlbinlog进行分析日志

[mysql@ezgclient mysqldata]$ mysqlbinlog mysqlbin.000149>/tmp/11_25.txt
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 58, event_type: 19

发现这个错误,第一反应是我使用的mysqlbinlog的版本不正确

[mysql@ezgclient mysqldata]$ whereis  mysqlbinlog
mysqlbinlog: /usr/bin/mysqlbinlog
[mysql@ezgclient mysqldata]$ /usr/bin/mysqlbinlog -V
/usr/bin/mysqlbinlog Ver 3.2 for redhat-linux-gnu at x86_64
[mysql@ezgclient mysqldata]$ /opt/mysql/product/5.1/bin/mysqlbinlog -V
/opt/mysql/product/5.1/bin/mysqlbinlog Ver 3.3 for unknown-linux-gnu at x86_64

通过查询果然发现在默认情况下调用的是系统默认安装的mysql中的mysqlbinlog,因为这个mysqlbinlog的版本和当前的bin_log的版本不能对应起来,所以不能处理,使用对应的mysqlbinlog工作正常

[mysql@ezgclient mysqllog]$ /opt/mysql/product/5.1/bin/mysqlbinlog mysqlbin.000150>/tmp/11_25.txt
[mysql@ezgclient mysqllog]$ more /tmp/11_25.txt 
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#111125 16:15:37 server id 2  end_log_pos 106   Start: binlog v 4, server v 5.1.35-log created 111125 16:15:37 at startup
# Warning: this binlog was not closed properly. Most probably mysqld crashed writing it.
ROLLBACK/*!*/;
BINLOG '
……………………………………
发表在 MySQL | 评论关闭

因域名解析导致数据库连接延迟分析

一、现状记录

[oracle@node1 ~]$ /sbin/ifconfig
eth1      Link encap:Ethernet  HWaddr 00:25:90:04:AB:6B  
          inet addr:192.168.9.140  Bcast:192.168.15.255  Mask:255.255.248.0
          inet6 addr: fe80::225:90ff:fe04:ab6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23530402 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10959123 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15308483748 (14.2 GiB)  TX bytes:10087987532 (9.3 GiB)
--IP地址为192.168.9.140

[oracle@node1 ~]$ more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               ecp-db localhost.localdomain localhost
192.168.9.140   node1.srtcloud.com
--域名node1.srtcloud.com对应ip192.168.9.140 

[oracle@node1 ~]$ lsnrctl status
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1.srtcloud.com)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.5.0 - Production
Start Date                04-NOV-2011 09:08:51
Uptime                    21 days 4 hr. 58 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /opt/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1.srtcloud.com)(PORT=1521)))
Services Summary...
Service "ecp" has 2 instance(s).
  Instance "ecp", status UNKNOWN, has 1 handler(s) for this service...
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ecpXDB" has 1 instance(s).
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ecp_XPT" has 1 instance(s).
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ora11g" has 2 instance(s).
  Instance "ora11g", status UNKNOWN, has 1 handler(s) for this service...
  Instance "ora11g", status READY, has 1 handler(s) for this service...
Service "ora11gXDB" has 1 instance(s).
  Instance "ora11g", status READY, has 1 handler(s) for this service...
The command completed successfully
--说明:ora11g是oracle 11g,ecp是oracle 10g
--当前使用域名node1.srtcloud.com监听

[oracle@node1 ~]$ more /opt/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora 
# tnsnames.ora Network Configuration File: /opt/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

ECP =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node1.srtcloud.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ecp)
    )
  )

ORA11G =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node1.srtcloud.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora11g)
    )
  )
--tns也通过域名访问

[oracle@node1 ~]$ more /etc/resolv.conf 
nameserver 211.155.235.201
nameserver 211.155.235.188
--当前有效的dns服务器

[oracle@node1 ~]$ more /etc/nsswitch.conf |grep hosts:
hosts:     files dns
--域名解析顺序

二、数据库正常工作分析
1、tns工作:客户端通过tns访问数据库,tns配置的是域名访问,所以需要解析,因为此刻解析的顺序是先利用/etc/hosts解析,所以读取hosts文件,获取到ip,然后访问对应数据库,和监听接触。
2、监听工作:监听的是域名,其实也是通过hosts解析成ip的
3、这里能够正常的工作,是因为hosts文件解析了域名

三、模拟数据库访问延迟

[oracle@node1 ~]$ more /etc/nsswitch.conf |grep hosts:
hosts:     dns files
--先使用dns服务器解析,再使用hosts文件

[oracle@node1 ~]$ more /etc/resolv.conf 
nameserver 11.1.1.1
--无效的dns服务器

[oracle@node1 ~]$ sqlplus chf/xifenfei@ora11g

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Nov 25 14:44:55 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.
--会在这里一个很长的时间等待

[oracle@node1 ~]$ lsnrctl status

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 25-NOV-2011 14:48:26

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1.srtcloud.com)(PORT=1521)))
--也会一个长时间的等待

--问题原因:就是因为解析域名的时候,先去访问dns服务器,因为该ip非dns服务器ip,所以会一直等待该ip超时,
--然后访问hosts文件获取ip地址(这个就是为什么我们登录或者查看监听状态的时候,会出现如此长的时间的等待)

其实因为dns延迟的现象有很多种,我这里只是举了一个最简单,比较常见的例子,在处理因dns解析的监听延迟的问题上,可以参考下面几点:
1、如果非特殊情况,尽可能使用ip地址在监听和tns中
2、如果是使用域名,请尽可能使用hosts解析,解析顺序配置为files优先(因为dns服务器有很多不确定,不可控因素)
3、如果一定要使用dns服务器解析,请把稳定的dns服务器配置在第一项,尽可能避免出现dns服务器不可达或者不存在该域名的现象

发表在 Oracle 监听 | 评论关闭

statement suspended, wait error to be cleared

一、现场报告
导入数据到Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX的时候,hang住了,求救

[oracle@TestServer-RHAS-5 dmpdir]$ impdp system/ DIRECTORY=dmpdir DUMPFILE=cscnew.20111123.dmp LOGFILE=cscnew.20111123.log SCHEMAS=CSCNEW remap_schema=CSCNEW:TESTB remap_tablespace=CSC_TAB_1:TESTB table_exists_action=replace
…………
. . imported "TESTB"."TAB_CS_SELF_WORKTIME"                  0 KB       0 rows
. . imported "TESTB"."TAB_CS_SELF_WORKTIME_DETAIL"           0 KB       0 rows
. . imported "TESTB"."TAB_CS_USERMENU"                       0 KB       0 rows
. . imported "TESTB"."TAB_PUB_BANK"                          0 KB       0 rows
. . imported "TESTB"."TAB_PUB_BUSISRVINFO"                   0 KB       0 rows
. . imported "TESTB"."TAB_PUB_CONTACT"                       0 KB       0 rows
Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX

二、处理过程
1、分析是否是impdp是否因为网络等情况终止掉

[oracle@TestServer-RHAS-5 ~]$ ps -ef|grep impdp
oracle    2520  1837  0 09:59 pts/8    00:00:00 grep impdp
oracle   23819 20966  0 09:39 pts/6    00:00:00 impdp         DIRECTORY=dmpdir DUMPFILE=cscnew.20111123.dmp LOGFILE=cscnew.20111123.log SCHEMAS=CSCNEW remap_schema=CSCNEW:TESTB remap_tablespace=CSC_TAB_1:TESTB table_exists_action=replace
[oracle@TestServer-RHAS-5 ~]$ ps -ef|grep LOCAL=YES
oracle    2692  1837  0 10:00 pts/8    00:00:00 grep LOCAL=YES
oracle   10754 10694  0 09:15 ?        00:00:09 oraclemcrm (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   23835 23819  0 09:40 ?        00:00:00 oraclemcrm (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

通过上面的查询,证明impdp进程工作正常

2、查询等待事件

[oracle@TestServer-RHAS-5 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Nov 24 10:00:26 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select event from v$session_wait where wait_class#<>6; 

EVENT
----------------------------------------------------------------
SQL*Net message to client
statement suspended, wait error to be cleared

通过这个查询,发现一个异常等待事件:statement suspended, wait error to be cleared。
查询MOS,确定是表空间不足引起impdp suspended
Statement Suspended, Wait Error To Be Cleared Wait Event [ID 761848.1]

Oracle Database provides a means for suspending, and later resuming, 
the execution of large database operations in the event of space allocation failures. 
This enables you to take corrective action instead of the Oracle Database server returning an error to the user. 
After the error condition is corrected, the suspended operation automatically resumes. 
This feature is called resumable space allocation. The statements that are affected are called resumable statements. 
The time between suspending the execution till correction of the error is reported as 
"statement suspended, wait error to be cleared" wait event.

3、查看alert.log日志文件确认

[oracle@TestServer-RHAS-5 ~]$ cd /opt/oracle/admin/mcrm/bdump/
[oracle@TestServer-RHAS-5 bdump]$ tail -30 alert_mcrm.log 
Thu Nov 24 09:29:20 2011
create tablespace testb
datafile '/opt/oradata/mcrm/testb.dbf'
size 1500M autoextend on next 50M maxsize 2000M
Thu Nov 24 09:29:51 2011
Completed: create tablespace testb
datafile '/opt/oradata/mcrm/testb.dbf'
size 1500M autoextend on next 50M maxsize 2000M
Thu Nov 24 09:40:00 2011
The value (30) of MAXTRANS parameter ignored.
kupprdp: master process DM00 started with pid=111, OS id=23858
         to execute - SYS.KUPM$MCP.MAIN('SYS_IMPORT_SCHEMA_01', 'SYSTEM', 'KUPC$C_1_20111124094000', 'KUPC$S_1_20111124094000', 0);
kupprdp: worker process DW01 started with worker id=1, pid=112, OS id=23870
         to execute - SYS.KUPW$WORKER.MAIN('SYS_IMPORT_SCHEMA_01', 'SYSTEM');
Thu Nov 24 09:43:11 2011
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_01.1' was suspended due to
    ORA-01652: unable to extend temp segment by 128 in tablespace TESTB
Thu Nov 24 10:00:45 2011
Thread 1 advanced to log sequence 4761 (LGWR switch)
  Current log# 3 seq# 4761 mem# 0: /opt/oradata/mcrm/redo03.log

4、查询TESTB表空间使用情况

SQL> select bytes/1024/1024,maxbytes/1024/1024,user_bytes/1024/1024 
  2  from dba_data_files where tablespace_name='TESTB';

BYTES/1024/1024 MAXBYTES/1024/1024 USER_BYTES/1024/1024
--------------- ------------------ --------------------
           2000               2000            1998.9375

5、解决问题

Thu Nov 24 10:04:21 2011
alter tablespace TESTB add datafile '/opt/oradata/mcrm/testb01.dbf' size 100m  autoextend on next 1m maxsize 30g
Thu Nov 24 10:04:25 2011
Completed: alter tablespace TESTB add datafile '/opt/oradata/mcrm/testb01.dbf' size 100m  autoextend on next 1m maxsize 30g
Thu Nov 24 10:04:26 2011
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_01.1' was resumed

通过这个日志可以看出,表空间不足的问题解决后(可以添加数据文件,或者resize数据文件大小),impdp的job又开始运行

发表在 Oracle性能优化 | 评论关闭