Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 170573

Two node cluster - Fencing misbehaivor?

$
0
0
Hello all,

I've been working with Proxmox VE for almost two years now and I decided to go a little further with HA, so we adquiered two Dell PowerEdge and prepare them to be a two-node cluster.

Everything has gone great so far until I apply the fencing rules. I've follow the how-to http://pve.proxmox.com/wiki/Two-Node...bility_Cluster even if it seems to be for the beta version. I've fencing configured with iDRAC6 network cards and with the 'reboot' option, the problem comes when I try to test the enviroment:

1. Two nodes running. I switch off on node that has no machines on it, so far so well second node detects that the node is off.
2. When the failing node boots up it automaticlly reboots the node that's working, leaving all working machines off... good it is not on production yet.
3. The new up-node will keep rebooting the node with the machines forever and so it is impossible to reach it.

I would like for the living node to wait until the other node boots. Am I suppose to delte and readd the node after a failure? I'm sure I am missing something here.

This is my configuration:

Storage


RAID 1 80 GB hahttp://pve.proxmox.com/wiki/Two-Node_High_Availability_Clusterrd disk in which the OS is installed.
RAID 5 1024 GB hard disk configured with DRBD http://pve.proxmox.com/wiki/DRBD
I've no problems with DRBD syncronization and every split-brain has been recovered quite well, I've set up the sync parameter to 110M so it syncronizes faster (they sync over a dedictaed GbE network card directly connected)
(all hardware RAID)
all disks local
NFS 5TB shared storage for backups.
NFS 1024GB shared storage for ISO's.

I have also set up a quorum disk but is not included in the cluster.conf yet, just in case this have something to do with the problem

Network

we have 6 GbE NIC's and the iDRAC dedicated NIC on each one.
The routes are so I can SSH them through a VPN
Code:

root@hypvdell02:~# cat /etc/network/interfaces
# network interface settings
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual

iface eth4 inet static
    address  192.168.0.27
    netmask  255.255.255.0

auto eth5
iface eth5 inet static
    address  10.0.0.23
    netmask  255.255.255.0

auto bond0
iface bond0 inet static
    address  192.168.0.25
    netmask  255.255.255.0
    slaves eth0 eth1 eth2
    bond_miimon 100
    bond_mode 802.3ad

auto vmbr0
iface vmbr0 inet static
    address  192.168.0.23
    netmask  255.255.255.0
    gateway  192.168.0.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0
    up route add -net 10.12.0.0 netmask 255.255.255.0 gw 192.168.0.111
    down route del -net 10.12.0.0 netmask 255.255.255.0 gw 192.168.0.111

Cluster config
Code:

root@hypvdell02:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="24" name="dellHA">
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;" ipaddr="192.168.0.20" login="root" name="fencenode1" passwd="5SVbXsVi58S0w7YEbWOJ" secure="1"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;" ipaddr="192.168.0.21" login="root" name="fencenode2" passwd="BPITqVvrZLmK8c1=-gT8" secure="1"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="hypvdell1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fencenode1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="hypvdell02" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fencenode2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="100"/>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="501"/>
  </rm>
</cluster>



Additional Info

Code:

root@hypvdell02:~# tail /var/log/cluster/fenced.log
Mar 20 14:40:00 fenced fenced 1352871249 started
Mar 20 14:40:52 fenced fencing node hypvdell1
Mar 20 14:41:02 fenced fence hypvdell1 dev 0.0 agent fence_drac5 result: error from agent
Mar 20 14:41:02 fenced fence hypvdell1 failed
Mar 20 14:41:05 fenced fencing node hypvdell1
Mar 20 14:41:15 fenced fence hypvdell1 dev 0.0 agent fence_drac5 result: error from agent
Mar 20 14:41:15 fenced fence hypvdell1 failed
Mar 20 14:41:18 fenced fencing node hypvdell1
Mar 20 14:41:26 fenced fence hypvdell1 dev 0.0 agent fence_drac5 result: error from agent
Mar 20 14:41:26 fenced fence hypvdell1 failed

Code:

root@hypvdell02:~# tail /var/log/cluster/corosync.log
Mar 20 14:39:56 corosync [CLM  ] Members Left:
Mar 20 14:39:56 corosync [CLM  ] Members Joined:
Mar 20 14:39:56 corosync [CLM  ]    r(0) ip(192.168.0.23)
Mar 20 14:39:56 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 20 14:39:56 corosync [CMAN  ] quorum regained, resuming activity
Mar 20 14:39:56 corosync [QUORUM] This node is within the primary component and will provide service.
Mar 20 14:39:56 corosync [QUORUM] Members[1]: 2
Mar 20 14:39:56 corosync [QUORUM] Members[1]: 2
Mar 20 14:39:56 corosync [CPG  ] chosen downlist: sender r(0) ip(192.168.0.23) ; members(old:0 left:0)
Mar 20 14:39:56 corosync [MAIN  ] Completed service synchronization, ready to provide service.

Please, I'm desperate here, let me know if you need any additional information

Viewing all articles
Browse latest Browse all 170573

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>