Quantcast
Viewing all articles
Browse latest Browse all 171116

HA migration on node failure restarts VMs

Hello,

I am trying to make a setup of Two-Node HA ( https://pve.proxmox.com/wiki/Two-Nod...bility_Cluster ).
I have 2 identical machines (Dell R720 with idrac7) and I have setup a PVE Cluster with these 2 and a quorum disk via an iscsi target from a third-machine.
Although everything seems to work fine: I can do live migration with no packet loss, if I manually fence a node
or crash it on purpose the VM running on the "broken" node is getting moved to the operational one but I get this in the logs:
Code:

Dec 22 02:01:11 rgmanager State change: proxmox2 DOWN
Dec 22 02:01:34 rgmanager Marking service:gfs2-2 as stopped: Restricted domain unavailable
Dec 22 02:01:34 rgmanager Starting stopped service pvevm:101
Dec 22 02:01:34 rgmanager [pvevm] VM 100 is running
Dec 22 02:01:35 rgmanager [pvevm] Move config for VM 101 to local node
Dec 22 02:01:36 rgmanager Service pvevm:101 started
==
Dec 22 02:01:11 fenced fencing node proxmox2
Dec 22 02:01:33 fenced fence proxmox2 success
==

I have 2 VMs (100,101) both are CentOS 6 (actually 101 is a clone of 100 ) with which I ran these tests.
The setup consists in a drbd session between the 2 nodes on top of which I run gfs2 ( no lvm involved ). I had a hard time
mounting this resource at startup and my cluster.conf looks like this now:
Code:

<?xml version="1.0"?>
<cluster config_version="39" name="Cluster">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <quorumd allow_kill="0" interval="1" label="cluster_qdisk" tko="10" votes="1"/>
  <totem token="1000"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.162.90" login="fence" name="proxmox1-drac" passwd="123456" secure="1"/>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.162.91" login="fence" name="proxmox2-drac" passwd="123456" secure="1"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="proxmox1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="proxmox1-drac"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="proxmox2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="proxmox2-drac"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
  <failoverdomains>
        <failoverdomain name="node1" nofailback="0" ordered="0" restricted="1">
            <failoverdomainnode name="proxmox1"/>
        </failoverdomain>
        <failoverdomain name="node2" nofailback="0" ordered="0" restricted="1">
            <failoverdomainnode name="proxmox2"/>
        </failoverdomain>
    </failoverdomains>
  <resources>
    <clusterfs name="gfs2" mountpoint="/gfs2" device="/dev/drbd0" fstype="gfs2" force_unmount="1" options="noatime,nodiratime,noquota"/>
  </resources>
  <service autostart="1" name="gfs2-1" domain="node1" exclusive="0">
    <clusterfs ref="gfs2"/>
  </service>
  <service autostart="1" name="gfs2-2" domain="node2" exclusive="0">
    <clusterfs ref="gfs2"/>
  </service>
    <pvevm autostart="1" vmid="100"/>
    <pvevm autostart="1" vmid="101"/>
  </rm>
</cluster>

( all the mess with failoverdomains and 2 services was the only solution I found to use the cluster to mount drbd0 )
So the question is why does the VM get restarted? as I see in rgmanager.log it says it was stopped..

Thank you,

Teodor

Viewing all articles
Browse latest Browse all 171116

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>