<服务>heartbeat

时间:March 9, 2017 分类:

目录:

heartbeat

heartbeat可以将资源(IP或程序服务等资源)从一台故障的计算机快速转移到另一台正常运转的机器上继续提供服务,一般称之为高可用服务,和keepalived有很多的相同之处。

heartbeat

工作原理,通过配置设置主服务器和热备服务器,在热备服务器上配置heartbeat的守护进程来监听主服务器的心跳,如果指定时间内无法检测到心跳,则会启动故障转移,取得主服务器上的相关资源的所有权,接替主服务器继续不间断的提供服务。

刚才的一个主备模式,也有主主模式,互为主备,他们之间会互相发送心跳确认对方状态,如果对方宕机,则接管所有的资源提供服务。

服务切换的可能为

  1. 服务器宕机
  2. heartbeat服务故障
  3. 心跳连接故障

一般推荐两个网卡网线直连,如果用路由器会可以因为交换机等网络设备就会造成心跳失联。

heartbeat裂脑

这个会在主主的时候出现,就是由于两个高可用服务器对之间在指定时间内无法互相检测,各种都启动了故障转移,所以这样会导致同一个IP在两个服务器同时启动,如果写入数据就会出现分别写入两端造成数据不一致或者数据丢失,如果是存储还好,如果是数据库恢复非常难,这种情况被称为裂脑,所以使用高可用也是有代价的。

heartbeat裂脑的原因

  1. 高可用服务器对之间心跳链路故障(网络设备故障,IP冲突),而两个服务器服务正常
  2. 高可用服务器对之间防火墙服务阻挡

防止裂脑

  1. 同时使用串行电缆和以太网电缆进行连接
  2. 其他机器检测到裂脑强行关闭主心跳节点,然后进行通知人工干预,可以是软件方面,也可以是硬件方面,例如fence(智能电源管理设备),通过其他机器通知主心跳节点关机
  3. 监控裂脑并进行人为干预,减少不稳定性
  4. 当然根据业务需求是否能容忍这样的损失

heardbeat消息类型

  1. 心跳消息
  2. 集群转换消息
  3. 重传请求

心跳消息约为150字节的数据包,可以为单播,广播或多播的方式,控制心跳频率及出现故障需要等待多久进行故障转移 集群转换消息,当主服务器恢复正常后通过ip-request消息要求备机释放主服务器失效时取得的资源,然后备份服务器释放资源后会返回一个ip-request-resp消息通知主服务器 重传请求则为控制重传心跳使用

心跳都是通过UDP协议发送到/etc/ha.d/ha.cf文件中指定端口。

heartbeat IP地址接管和故障转移

Heartbeat是通过IP地址接管和ARP广播进行故障转移,在主服务器接管资源后,会立刻强制更新所有客户端的ARP表,清空客户端本地缓存的失败服务器的VIP地址和MAC地址解析记录,确保客户端与新的主服务器会话。

安装heartbeat

机器和网络环境

主机名 内网IP VIP 用于心跳连接的IP
heartbeat1 192.168.0.203 192.168.0.211 192.168.3.203
heartbeat2 192.168.0.130 192.168.0.212 192.168.3.130
[root@heartbeat1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:F5:3A:84  
          inet addr:192.168.0.203  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fef5:3a84/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:435460 errors:0 dropped:0 overruns:0 frame:0
          TX packets:123674 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:147440208 (140.6 MiB)  TX bytes:34501012 (32.9 MiB)
          Interrupt:19 Base address:0x2000 

eth1      Link encap:Ethernet  HWaddr 00:0C:29:F5:3A:8E  
          inet addr:192.168.3.203  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fef5:3a8e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:252089 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21968092 (20.9 MiB)  TX bytes:746 (746.0 b)
          Interrupt:16 Base address:0x2080 
[root@heartbeat2 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:D1:C4:C3  
          inet addr:192.168.0.130  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fed1:c4c3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:437 errors:0 dropped:0 overruns:0 frame:0
          TX packets:240 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:38794 (37.8 KiB)  TX bytes:48662 (47.5 KiB)
          Interrupt:19 Base address:0x2000 

eth1      Link encap:Ethernet  HWaddr 00:0C:29:D1:C4:CD  
          inet addr:192.168.3.130  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fed1:c4cd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:114 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:8377 (8.1 KiB)  TX bytes:1152 (1.1 KiB)
          Interrupt:16 Base address:0x2080 

192.168.0.0/24为内网IP,192.168.3.0/24为heartbeat通信网段

在两台机器上配置hosts

[root@heartbeat1 ~]# vi /etc/hosts
[root@heartbeat1 ~]# tail -2 /etc/hosts
192.168.0.203 heartbeat1
192.168.0.130 heartbeat2

在两台机器上添加心跳连接的路由

[root@heartbeat1 ~]# route add -host 192.168.3.130 dev eth1
[root@heartbeat1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.3.130   0.0.0.0         255.255.255.255 UH    0      0        0 eth1
192.168.3.0     0.0.0.0         255.255.255.0   U     1      0        0 eth1
192.168.0.0     0.0.0.0         255.255.255.0   U     1      0        0 eth0
0.0.0.0         192.168.0.1     0.0.0.0         UG    0      0        0 eth0
[root@heartbeat2 ~]# route add -host 192.168.3.230 dev eth1

检查两个心跳IP之间链路是否连通

[root@heartbeat1 ~]# ping 192.168.3.130
PING 192.168.3.130 (192.168.3.130) 56(84) bytes of data.
64 bytes from 192.168.3.130: icmp_seq=1 ttl=64 time=3.30 ms
64 bytes from 192.168.3.130: icmp_seq=2 ttl=64 time=1.78 ms
64 bytes from 192.168.3.130: icmp_seq=3 ttl=64 time=0.557 ms
64 bytes from 192.168.3.130: icmp_seq=4 ttl=64 time=0.526 ms
^C
--- 192.168.3.130 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3030ms
rtt min/avg/max/mdev = 0.526/1.543/3.306/1.137 ms

[root@heartbeat2 ~]# ping 192.168.3.203
PING 192.168.3.203 (192.168.3.203) 56(84) bytes of data.
64 bytes from 192.168.3.203: icmp_seq=1 ttl=64 time=5.38 ms
64 bytes from 192.168.3.203: icmp_seq=2 ttl=64 time=0.874 ms
^C
--- 192.168.3.203 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1811ms
rtt min/avg/max/mdev = 0.874/3.128/5.383/2.255 ms

记得要关闭防火墙,要不会产生裂脑 [root@heartbeat1 ~]# service iptables stop [root@heartbeat2 ~]# service iptables stop

安装hearbeat

[root@heartbeat1 ~]# rpm -ivh http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
Retrieving http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
warning: /var/tmp/rpm-tmp.WqMcPL: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...                ########################################### [100%]
   1:epel-release           ########################################### [100%]
[root@heartbeat1 ~]# yum install -y heartbeat*

拷贝配置文件

[root@heartbeat1 ~]# ll /etc/ha.d/
total 20
-rw-r--r-- 1 root root  692 Dec  3  2013 README.config
-rwxr-xr-x 1 root root  745 Dec  3  2013 harc
drwxr-xr-x 2 root root 4096 Mar  7 00:03 rc.d
drwxr-xr-x 2 root root 4096 Mar  7 00:03 resource.d
-rw-r--r-- 1 root root 2082 May 11  2016 shellfuncs
[root@heartbeat1 ~]# ll /usr/share/doc/heartbeat-3.0.4/ 
total 144
-rw-r--r-- 1 root root  3701 Dec  3  2013 AUTHORS
-rw-r--r-- 1 root root 17989 Dec  3  2013 COPYING
-rw-r--r-- 1 root root 26532 Dec  3  2013 COPYING.LGPL
-rw-r--r-- 1 root root 58752 Dec  3  2013 ChangeLog
-rw-r--r-- 1 root root  2935 Dec  3  2013 README
-rw-r--r-- 1 root root  1873 Dec  3  2013 apphbd.cf
-rw-r--r-- 1 root root   645 Dec  3  2013 authkeys
-rw-r--r-- 1 root root 10502 Dec  3  2013 ha.cf
-rw-r--r-- 1 root root  5905 Dec  3  2013 haresources
[root@heartbeat1 ~]# cd /usr/share/doc/heartbeat-3.0.4/
[root@heartbeat1 heartbeat-3.0.4]# cp ha.cf haresources authkeys /etc/ha.d/
  • ha.cf heartbeat参数配置
  • authkey heartbeat认证文件
  • haresource heartbeat资源配置文件

ha.cf详解

debugfile /var/log/ha-debug     #heartbeat调试日志
logfile /var/log/ha-log         #heartbeat正常日志
logfacility     local0          #syslog副本中通过local0接受日志
keepalive 2                     #心跳间隔时间
deadtime 30                     #确定主节点服务宕机的时间
warntime 10                     #心跳延迟时间,如果该时间内无法接受到心跳,会写入waring到日志
initdead 120                    #初始化宕机时间,至少为deadtime时间过长
#bcast eth1                     #以广播的方式进行在eth1接口上广播实现心跳传递
mcast eth1 225.0.0.203 694 1 0  #设置广播使用的端口,注意两个主机使用相同的广播IP
auto_failback on                #主节点恢复后是否切回主节点
node heartbeat1                 #节点名也可以是IP地址
node heartbeat2
crm no                          #是否开启集群资源管理

authkey文件详解

文件必须被授权为600,可以设置的认证方式,crc,sha1和md5三种方式,不过crc是最不安全的

auth 1
1 sha1 47e9336850f1db6fa58bc470bc9b7810eb397f04     #这个东西随便写,只要两边一致即可

授权authkeys为600

[root@heartbeat1 ha.d]# chmod 600 authkeys

如果不授权为600就会出现

heartbeat[3688]: 2017/03/07_22:24:33 info: Pacemaker support: no
heartbeat[3688]: 2017/03/07_22:24:33 ERROR: Bad permissions on keyfile [/etc/ha.d//authkeys], 600 recommended.
heartbeat[3688]: 2017/03/07_22:24:33 ERROR: Authentication configuration error.
heartbeat[3688]: 2017/03/07_22:24:33 ERROR: Configuration error, heartbeat not started.

haresource

heartbeat1 IPaddr::192.168.0.211/24/eth0
heartbeat2 IPaddr::192.168.0.212/24/eth0

IPaddr是/etc/ha.d/resource.d/IPaddr脚本文件

heartbeat默认只处理自己主机的配置,如果发现对方宕机后才会执行对方的配置

配置的主机名一定要通过uname -n检查一下

[root@heartbeat1 ~]# uname -n
heartbeat1

检测heartbeat

在heartbeat1上启动heartbeat

[root@heartbeat2 ha.d]# /etc/init.d/heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
INFO:  Resource is stopped
Done.
[root@heartbeat2 ha.d]# ip addr | grep 192.168.0
    inet 192.168.0.130/24 brd 192.168.0.255 scope global eth0
[root@heartbeat2 ha.d]# ip addr | grep 192.168.0
    inet 192.168.0.130/24 brd 192.168.0.255 scope global eth0

现在并没有VIP启动,这个是因为initdead 120定义的120s时间来确定另一台heartbeat,此时日志文件

[root@heartbeat2 ha.d]# tail -f /var/log/ha-debug
Mar 07 23:00:09 heartbeat2 heartbeat: [10453]: info: Pacemaker support: no                  #使用了crm no导致的
Mar 07 23:00:09 heartbeat2 heartbeat: [10453]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Mar 07 23:00:09 heartbeat2 heartbeat: [10453]: info: **************************
Mar 07 23:00:09 heartbeat2 heartbeat: [10453]: info: Configuration validated. Starting heartbeat 3.0.4
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: heartbeat: version 3.0.4
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: Heartbeat generation: 1488896763
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: glib: UDP multicast heartbeat started for group 225.0.0.203 port 694 interface eth1 (ttl=1 loop=0) #在255.0.0.203进行广播,网卡为eth1,协议为UDP
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: G_main_add_TriggerHandler: Added signal manual handler
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: G_main_add_TriggerHandler: Added signal manual handler
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Mar 07 23:00:09 heartbeat2 heartbeat: [10454]: info: Local status now set to: 'up'

120s后可以检测到VIP

[root@heartbeat2 ha.d]# ip addr | grep 192.168.0
    inet 192.168.0.130/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.211/24 brd 192.168.0.255 scope global secondary eth0
    inet 192.168.0.212/24 brd 192.168.0.255 scope global secondary eth0

看到此时新增的日志,判断了heartbeat1已经关闭,并且执行了/etc/ha.d/resource.d/IPaddr 192.168.0.212/24/eth0 start等操作

Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: WARN: node heartbeat1: is dead
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: info: Comm_now_up(): updating status to active
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: info: Local status now set to: 'active'
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: WARN: No STONITH device configured.
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: WARN: Shared disks are not protected.
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: info: Resources being acquired from heartbeat1.
Mar 07 23:02:10 heartbeat2 heartbeat: [10487]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[10487]:   2017/03/07_23:02:10 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[10517]:  2017/03/07_23:02:10 info: Taking over resource group IPaddr::192.168.0.211/24/eth0
ResourceManager(default)[10582]:    2017/03/07_23:02:10 info: Acquiring resource group: heartbeat1 IPaddr::192.168.0.211/24/eth0
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.212)[10555]: 2017/03/07_23:02:10 INFO:  Resource is stopped
Mar 07 23:02:10 heartbeat2 heartbeat: [10488]: info: Local Resource acquisition completed.
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: debug: StartNextRemoteRscReq(): child count 2
Mar 07 23:02:10 heartbeat2 heartbeat: [10454]: debug: StartNextRemoteRscReq(): child count 1
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.211)[10642]: 2017/03/07_23:02:11 INFO:  Resource is stopped
ResourceManager(default)[10582]:    2017/03/07_23:02:11 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.211/24/eth0 start
IPaddr(IPaddr_192.168.0.211)[10803]:    2017/03/07_23:02:11 INFO: Adding inet address 192.168.0.211/24 with broadcast address 192.168.0.255 to device eth0
IPaddr(IPaddr_192.168.0.211)[10803]:    2017/03/07_23:02:11 INFO: Bringing device eth0 up
IPaddr(IPaddr_192.168.0.211)[10803]:    2017/03/07_23:02:11 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.211 eth0 192.168.0.211 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.211)[10777]: 2017/03/07_23:02:11 INFO:  Success
INFO:  Success
mach_down(default)[10517]:  2017/03/07_23:02:11 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[10517]:  2017/03/07_23:02:11 info: mach_down takeover complete for node heartbeat1.
Mar 07 23:02:11 heartbeat2 heartbeat: [10454]: info: mach_down takeover complete.
Mar 07 23:02:11 heartbeat2 heartbeat: [10454]: info: Initial resource acquisition complete (mach_down)
Mar 07 23:02:11 heartbeat2 heartbeat: [10899]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[10899]:   2017/03/07_23:02:11 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[10899]:    2017/03/07_23:02:11 received ip-request-resp IPaddr::192.168.0.212/24/eth0 OK yes
ResourceManager(default)[10922]:    2017/03/07_23:02:11 info: Acquiring resource group: heartbeat2 IPaddr::192.168.0.212/24/eth0
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.212)[10950]: 2017/03/07_23:02:12 INFO:  Resource is stopped
ResourceManager(default)[10922]:    2017/03/07_23:02:12 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.212/24/eth0 start
IPaddr(IPaddr_192.168.0.212)[11075]:    2017/03/07_23:02:12 INFO: Adding inet address 192.168.0.212/24 with broadcast address 192.168.0.255 to device eth0
IPaddr(IPaddr_192.168.0.212)[11075]:    2017/03/07_23:02:12 INFO: Bringing device eth0 up
IPaddr(IPaddr_192.168.0.212)[11075]:    2017/03/07_23:02:12 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.212 eth0 192.168.0.212 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.212)[11049]: 2017/03/07_23:02:12 INFO:  Success
INFO:  Success
ARPING 192.168.0.211 from 192.168.0.211 eth0
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
ARPING 192.168.0.212 from 192.168.0.212 eth0
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Mar 07 23:02:21 heartbeat2 heartbeat: [10454]: info: Local Resource acquisition completed. (none)
Mar 07 23:02:21 heartbeat2 heartbeat: [10454]: info: local resource transition completed.

在heartbeat1上启动heartbeat

[root@heartbeat1 ha.d]# /etc/init.d/heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
INFO:  Resource is stopped
Done.

在启动后可以看到属于heart1的192.168.0.211的vip存在了

[root@heartbeat1 ha.d]# ip addr | grep 192.168.0
    inet 192.168.0.203/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.211/24 brd 192.168.0.255 scope global secondary eth0

而heartbeat2主机也只有属于其的192.168.0.212

[root@heartbeat2 ha.d]# ip addr | grep 192.168.0
    inet 192.168.0.130/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.212/24 brd 192.168.0.255 scope global secondary eth0

可以看到heartbeat2日志中是进行到了广播,检测到heartbeat1启动,自动执行了/etc/ha.d/resource.d/IPaddr 192.168.0.211/24/eth0 stop

Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: info: Link heartbeat1:eth1 up.
Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: info: Status update for node heartbeat1: status init
Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: info: Status update for node heartbeat1: status up
Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: debug: StartNextRemoteRscReq(): child count 1
Mar 07 23:08:40 heartbeat2 heartbeat: [11171]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[11171]:   2017/03/07_23:08:40 info: Running /etc/ha.d//rc.d/status status
Mar 07 23:08:40 heartbeat2 heartbeat: [11189]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[11189]:   2017/03/07_23:08:40 info: Running /etc/ha.d//rc.d/status status
Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: debug: get_delnodelist: delnodelist= 
Mar 07 23:08:40 heartbeat2 heartbeat: [10454]: info: all clients are now paused
Mar 07 23:08:41 heartbeat2 heartbeat: [10454]: info: Status update for node heartbeat1: status active
Mar 07 23:08:41 heartbeat2 heartbeat: [11206]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[11206]:   2017/03/07_23:08:41 info: Running /etc/ha.d//rc.d/status status
Mar 07 23:08:41 heartbeat2 heartbeat: [10454]: info: remote resource transition completed.
Mar 07 23:08:41 heartbeat2 heartbeat: [10454]: info: heartbeat2 wants to go standby [foreign]
Mar 07 23:08:42 heartbeat2 heartbeat: [10454]: info: standby: heartbeat1 can take our foreign resources
Mar 07 23:08:42 heartbeat2 heartbeat: [11223]: info: give up foreign HA resources (standby).
ResourceManager(default)[11236]:    2017/03/07_23:08:42 info: Releasing resource group: heartbeat1 IPaddr::192.168.0.211/24/eth0
ResourceManager(default)[11236]:    2017/03/07_23:08:42 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.211/24/eth0 stop
IPaddr(IPaddr_192.168.0.211)[11299]:    2017/03/07_23:08:42 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.211)[11273]: 2017/03/07_23:08:42 INFO:  Success
INFO:  Success
Mar 07 23:08:42 heartbeat2 heartbeat: [11223]: info: foreign HA resource release completed (standby).
Mar 07 23:08:42 heartbeat2 heartbeat: [10454]: info: Local standby process completed [foreign].
Mar 07 23:08:42 heartbeat2 heartbeat: [10454]: info: all clients are now resumed
Mar 07 23:08:43 heartbeat2 heartbeat: [10454]: WARN: 1 lost packet(s) for [heartbeat1] [10:12]
Mar 07 23:08:43 heartbeat2 heartbeat: [10454]: info: remote resource transition completed.
Mar 07 23:08:43 heartbeat2 heartbeat: [10454]: info: No pkts missing from heartbeat1!
Mar 07 23:08:43 heartbeat2 heartbeat: [10454]: info: Other node completed standby takeover of foreign resources.

heartbeat配置高可用

安装http

[root@heartbeat1 ~]# yum install -y httpd

写入不同的主页并启动

[root@heartbeat1 ~]# echo '211' > /var/www/html/index.html
[root@heartbeat1 ~]# service httpd start
[root@heartbeat2 ~]# echo '212' > /var/www/html/index.html

检验

[root@heartbeat1 ~]# curl 192.168.0.211
211
[root@heartbeat1 ~]# /etc/init.d/heartbeat stop 
Stopping High-Availability services: Done.

[root@heartbeat1 ~]# curl 192.168.0.211
212

heartbeat的完全释放和完全接管命令

/usr/share/heartbeat/hb_standby
/usr/share/heartbeat/hb_takeover

让heartbeat负责http服务的启动与关闭

[root@heartbeat1 ~]# cp /etc/init.d/httpd /etc/ha.d/resource.d/     #当然系统命令不放到这个目录下也可以,但是自定义脚本需要放入这里,并有执行权限
[root@heartbeat1 ~]# vi /etc/ha.d/haresources 
[root@heartbeat1 ~]# tail /etc/ha.d/haresources 
heartbeat1 IPaddr::192.168.0.211/24/eth0 httpd
[root@heartbeat1 ~]# service httpd stop

heartbeat2也如此执行

启动heartbeat

[root@heartbeat1 ~]# service heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
INFO:  Resource is stopped
Done.

可以看一下此时的80端口

[root@heartbeat1 ~]# lsof -i:80
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
clock-app 23135 root   21u  IPv4 122063      0t0  TCP heartbeat1:45128->a184-50-87-96.deploy.static.akamaitechnologies.com:http (CLOSE_WAIT)

启动第二台后

[root@heartbeat2 html]# service heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
INFO:  Resource is stopped
Done.

在第二台无法看到

[root@heartbeat2 html]# lsof -i:80

关闭第一台的heartbeat

[root@heartbeat1 ~]# service heartbeat stop 
Stopping High-Availability services: Done.

可以看到80端口的过程

[root@heartbeat1 ~]# lsof -i:80
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
clock-app 23135 root   21u  IPv4 125282      0t0  TCP heartbeat1:45133->a184-50-87-96.deploy.static.akamaitechnologies.com:http (CLOSE_WAIT)
[root@heartbeat1 ~]# lsof -i:80
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
clock-app 23135 root   21u  IPv4 128683      0t0  TCP heartbeat1:45134->a184-50-87-96.deploy.static.akamaitechnologies.com:http (ESTABLISHED)

而第二台可以看到httpd服务启动了

[root@heartbeat2 ~]# lsof -i:80
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
httpd   13629   root    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13634 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13636 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13638 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13640 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13641 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13645 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13646 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)
httpd   13649 apache    8u  IPv6  28168      0t0  TCP *:http (LISTEN)

heartbeat可以负责VIP的转移,也可以负责服务的启动和停止,对于数据库和存储,就是即负责VIP转移,又负责服务的启动,而对于web服务其实负载VIP转移即可,而对于负载均衡的lvs服务最好是用keepalived