Heartbeat安装及简单配置

1、创建用户和组
[root@node1 ~]# groupadd -g 694 haclient
[root@node1 ~]# useradd -u 694 -g haclient hacluster

2、Heartbeat安装
1)安装libnet
[root@node1 software]# pwd
/tmp/software
[root@node1 software]# ll
总计 4200
-rw-r–r– 1 root root 3267773 08-16 18:51 heartbeat-2.0.8.tar.gz
-rw-r–r– 1 root root 1021236 08-16 18:51 libnet-1.1.2.1.tar.gz
[root@node1 software]# tar xf libnet-1.1.2.1.tar.gz
[root@node1 sofeware]# cd libnet
[root@node1 sofeware]#./configure
[root@node1 sofeware]#make
[root@node1 sofeware]#make install
2)安装heartbeat
[root@node1 software]# tar xf heartbeat-2.0.8.tar.gz
[root@node1 sofeware]# cd heartbeat-2.0.8
[root@node1 heartbeat-2.0.8]#./ConfigureMe configure –disable-swig  –disable-snmp-subagent
[root@node1 heartbeat-2.0.8]#make
[root@node1 heartbeat-2.0.8]#make install
[root@node1 heartbeat-2.0.8]# cp doc/ha.cf doc/authkeys doc/haresources /etc/ha.d/
#Heartbeat的主要配置文件有ha.cf,authkeys和haresources,在Heartbeat安装后,默认并没有这3个文件,可以从官网上下载,也可以从解压出来的源码目录中找到,所以我们这里直接在源码目录中拷贝即可。
3、修改参数
1)ha.cf
#logfacility     local0                 #可注释掉此选项,开启下面的日志路径;
logfile        /var/log/ha-log          #设置heartbeat日志存放位置;
keepalive 2 #设定心跳(监测)时间时间为2秒;
warntime 5 #连续多长时间联系不上后开始警告提示;
deadtime 20 #连续多长时间联系不上后认为对方挂掉了(单位是妙);
initdead 120 #这里主要是给重启后预留的一段忽略时间段(比如:重启后启动网络等,如果在网络还没有通,keepalive检测肯定通不过,但这时候并不能切换),此值至少为deadtime的两倍;
udpport 694 #设置广播通信的端口,默认为694;
baud 19200        #设置串行通讯的波特率;
bcast eth1 #指明心跳使用以太网的广播方式,并且在eth1口上进行广播;
ucast   eth1 10.10.10.2  #单播(广播,单播选择其一)
auto_failback off #恢复正常后是否需要再自动切换回来,此处off说明恢复后不需要切换;
node node1        #主节点主机名,可以通过“uname -n”查看;
node node2     #备用节点主机名;
ping 192.168.0.254 #测试网络连通性,此处自定义,一般设为网关地址,但要保证是通的;
respawn hacluster /usr/lib/heartbeat/ipfail #可选,列出和heartbeat一起启动和关闭的进程;
2)Haresources
node1 IPaddr::192.168.1.100/24/eth0/ Filesystem::/dev/sdc1::/shared::ext3 cups
node-name resource1 resource2 … resourceN
其中node-name即为集群中某一节点的名称,必须与uname –n相同,
后面的资源组resource1 resource2 …resourceN中每一个资源都是一个shell脚本,它们的搜索路径为/etc/init.d/和/usr/local/etc/ha.d/resource.d(该路径根据你所安装heartbeat的路径有所不同),heartbeat为我们提供了一个非常好的资源扩展框架,如果我们需要控制一种自己的资源,只需要实现一个支持start和stop参数的shell脚本就可以了,目前heartbeat所支持的资源脚本可以在我提供的上述路径中去查看。
1) 资源组的第一列是我们在ha.cf配置文件中的node之一,而且应该是当前准备作为primary节点的那一个node;
2)每一行代表一个资源组,如果一行写不下可以用” “换行;
3)资源组启动顺序是从左往右,关闭的顺序是从右往左;
4)脚本的参数通过::来分隔和传递;
5)一个资源组里面不同资源之间以空格分隔;
6)不同的资源组之间没有必然关系;
7)每个资源都是一个角本,可以是在/etc/init.d目录下面的,也可以是/usr/local/etc/ha.d/resource.d目录下面的角本。这些角本必须要支持xxx start;xxx stop;模式;

3)Authkeys
[root@node1 ~]# vim /etc/ha.d/authkeys
auth 1
1 crc
[root@node1 ~]#chmod 600 /etc/ha.d/authkeys
我们如果要采用sha1算法,只需要将authkeys中的auth 指令(去掉注释符)改为2,而对应的2 sha1行则需要去掉注释符(#),后面的密钥自己改变(两节点上必须相同)
发表在 Linux高可用 | 2 条评论

ORA-01578坏块解决(2)

ORA-01578坏块解决(1)续集
如果在坏块之前,有rman备份,可以使用rman的备份来进行恢复,确保数据不会被丢失

1、使用rman进行恢复
[oracle@ECP-UC-DB1 ~]$ $ORACLE_HOME/bin/rman target /
Recovery Manager: Release 10.2.0.4.0 – Production on Sun Aug 14 22:21:13 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.

connected to target database: TEST (DBID=2056006906)

RMAN> blockrecover datafile 6 block 1477;

Starting blockrecover at 2011-08-14 22:21:16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /tmp/0fmk0ii5_1_1
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/tmp/0fmk0ii5_1_1 tag=TAG20110814T213357
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:02

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished blockrecover at 2011-08-14 22:21:23

2、检查坏块是否被恢复
RMAN> backup check logical validate datafile 6;

Starting backup at 2011-08-14 22:22:11
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/opt/oracle/oradata/test/xifenfei01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2011-08-14 22:22:12

RMAN> exit

Recovery Manager complete.

[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:22:17 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select file#,block#,blocks from v$database_block_corruption;

no rows selected

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@ECP-UC-DB1 ~]$ dbv file =/opt/oracle/oradata/test/xifenfei01.dbf

DBVERIFY: Release 10.2.0.4.0 – Production on Sun Aug 14 22:22:38 2011

Copyright (c) 1982, 2007, Oracle. All rights reserved.

DBVERIFY – Verification starting : FILE = /opt/oracle/oradata/test/xifenfei01.dbf

DBVERIFY – Verification complete

Total Pages Examined : 2560
Total Pages Processed (Data) : 1372
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 48
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1140
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1256690 (0.1256690)

3、验证数据是否正确
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:34:18 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select count(*) from t_rep;

COUNT(*)
———-
49857

ORA-01578坏块解决(1)中的模拟环境比较,数据恢复正确,坏块问题解决

发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 | 评论关闭

ORA-01578坏块解决(1)

一、创建测试表
SQL> create table t_rep as
2  select * from all_objects;
Table created.
SQL> select count(*) from  t_rep;
COUNT(*)
———-
49857
二、使用bbed修改数据块
三、错误现象
1、sqlplus窗口
SQL> select count(*) from  t_rep;
select count(*) from  t_rep
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 1477)
ORA-01110: data file 6: ‘/opt/oracle/oradata/test/xifenfei01.dbf’
2、alert.log文件中
Sun Aug 14 22:01:14 2011
Hex dump of (file 6, block 1477) in trace file /opt/oracle/admin/test/udump/test_ora_10785.trc
Corrupt block relative dba: 0x018005c5 (file 6, block 1477)
Bad check value found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x018005c5
last change scn: 0×0000.001328ef seq: 0×2 flg: 0×04
spare1: 0×0 spare2: 0×0 spare3: 0×0
consistency value in tail: 0x28ef0602
check value in block header: 0×493
computed block checksum: 0x44b9
Reread of rdba: 0x018005c5 (file 6, block 1477) found same corrupted data
Sun Aug 14 22:01:15 2011
Corrupt Block Found
TSN = 6, TSNAME = XFF
RFN = 6, BLK = 1477, RDBA = 25167301
OBJN = 52727, OBJD = 52728, OBJECT = T_REP, SUBOBJECT =
SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
四、验证是否真的坏块
1、dbv验证
[oracle@ECP-UC-DB1 ~]$ dbv file =/opt/oracle/oradata/test/xifenfei01.dbf
DBVERIFY: Release 10.2.0.4.0 – Production on Sun Aug 14 22:08:37 2011
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY – Verification starting : FILE = /opt/oracle/oradata/test/xifenfei01.dbf
Page 1477 is marked corrupt
Corrupt block relative dba: 0x018005c5 (file 6, block 1477)
Bad check value found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x018005c5
last change scn: 0×0000.001328ef seq: 0×2 flg: 0×04
spare1: 0×0 spare2: 0×0 spare3: 0×0
consistency value in tail: 0x28ef0602
check value in block header: 0×493
computed block checksum: 0x44b9
DBVERIFY – Verification complete
Total Pages Examined         : 2560
Total Pages Processed (Data) : 1371
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 48
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1140
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 1256043 (0.1256043)
2、rman验证
RMAN> backup check logical validate datafile 6;
Starting backup at 2011-08-14 22:09:51
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/opt/oracle/oradata/test/xifenfei01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2011-08-14 22:09:53
RMAN> exit
Recovery Manager complete.
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:10:00 2011
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>  select file#,block#,blocks from v$database_block_corruption;SQL> exec dbms_repair.skip_corrupt_blocks(‘SYS’,’T_REP’);
FILE#     BLOCK#     BLOCKS
———- ———- ———-
6       1477          1
五、跳过坏块读取其他数据
SQL> exec dbms_repair.skip_corrupt_blocks(‘SYS’,’T_REP’);
PL/SQL procedure successfully completed.
SQL> select skip_corrupt from dba_tables where table_name=’T_REP’;
SKIP_COR
——–
ENABLED
SQL> select count(*) from t_rep;
COUNT(*)
———-
49794
说明:数据发生丢失6号文件的1477块中的数据丢失
发表在 ORA-xxxxx, Oracle备份恢复 | 标签为 | 一条评论