HA migration on node failure restarts VMs

Hello,

I am trying to make a setup of Two-Node HA ( https://pve.proxmox.com/wiki/Two-Nod...bility_Cluster ).
I have 2 identical machines (Dell R720 with idrac7) and I have setup a PVE Cluster with these 2 and a quorum disk via an iscsi target from a third-machine.
Although everything seems to work fine: I can do live migration with no packet loss, if I manually fence a node
or crash it on purpose the VM running on the "broken" node is getting moved to the operational one but I get this in the logs:

Code:

Dec 22 02:01:11 rgmanager State change: proxmox2 DOWN

Dec 22 02:01:34 rgmanager Marking service:gfs2-2 as stopped: Restricted domain unavailable

Dec 22 02:01:34 rgmanager Starting stopped service pvevm:101

Dec 22 02:01:34 rgmanager [pvevm] VM 100 is running

Dec 22 02:01:35 rgmanager [pvevm] Move config for VM 101 to local node

Dec 22 02:01:36 rgmanager Service pvevm:101 started

==

Dec 22 02:01:11 fenced fencing node proxmox2

Dec 22 02:01:33 fenced fence proxmox2 success

==

I have 2 VMs (100,101) both are CentOS 6 (actually 101 is a clone of 100 ) with which I ran these tests.
The setup consists in a drbd session between the 2 nodes on top of which I run gfs2 ( no lvm involved ). I had a hard time
mounting this resource at startup and my cluster.conf looks like this now:

Code:

<?xml version="1.0"?>

<cluster config_version="39" name="Cluster">

  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>

  <quorumd allow_kill="0" interval="1" label="cluster_qdisk" tko="10" votes="1"/>

  <totem token="1000"/>

  <fencedevices>

    <fencedevice agent="fence_ipmilan" ipaddr="192.168.162.90" login="fence" name="proxmox1-drac" passwd="123456" secure="1"/>

    <fencedevice agent="fence_ipmilan" ipaddr="192.168.162.91" login="fence" name="proxmox2-drac" passwd="123456" secure="1"/>

  </fencedevices>

  <clusternodes>

    <clusternode name="proxmox1" nodeid="1" votes="1">

      <fence>

        <method name="1">

          <device name="proxmox1-drac"/>

        </method>

      </fence>

    </clusternode>

    <clusternode name="proxmox2" nodeid="2" votes="1">

      <fence>

        <method name="1">

          <device name="proxmox2-drac"/>

        </method>

      </fence>

    </clusternode>

  </clusternodes>

  <rm>

   <failoverdomains>

        <failoverdomain name="node1" nofailback="0" ordered="0" restricted="1">

            <failoverdomainnode name="proxmox1"/>

        </failoverdomain>

        <failoverdomain name="node2" nofailback="0" ordered="0" restricted="1">

            <failoverdomainnode name="proxmox2"/>

        </failoverdomain>

    </failoverdomains>

   <resources>

    <clusterfs name="gfs2" mountpoint="/gfs2" device="/dev/drbd0" fstype="gfs2" force_unmount="1" options="noatime,nodiratime,noquota"/>

   </resources>

   <service autostart="1" name="gfs2-1" domain="node1" exclusive="0">

    <clusterfs ref="gfs2"/>

   </service>

   <service autostart="1" name="gfs2-2" domain="node2" exclusive="0">

    <clusterfs ref="gfs2"/>

   </service>

    <pvevm autostart="1" vmid="100"/>

    <pvevm autostart="1" vmid="101"/>

  </rm>

</cluster>

( all the mess with failoverdomains and 2 services was the only solution I found to use the cluster to mount drbd0 )
So the question is why does the VM get restarted? as I see in rgmanager.log it says it was stopped..

Thank you,

Teodor

HA migration on node failure restarts VMs

Trending Articles

Summary of The Schoolboy by William Blake

Division 4 ya 29

The 10 Tennessee Cities With The Largest Black Population For 2021

Henrique & Juliano – Manifesto Musical 2 (Ao Vivo) – EP 3 [iTunes Plus M4A]

Subwoofer kukoroma kabla ya kuliwasha!

Skint TV teen to be sentenced

Outlook でメールを保存または送信時に...

Who’s been sentenced at Northampton Magistrates’ Court

Young men fined for drugs

Black Angus Grilled Artichokes

Sexual Assault Alert, Man Wanted in an ongoing Sexual Assault investigation,...

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

Devon police appeal for help to trace missing 13-year-old girl

BHUNP TBBP - 3BBB(UNP Renewal)

Can I request a sedan if I book full-size luxury suv?

Teenage girl from North Devon suffered panic attacks from being...

NATHAN CARL DAHLIN Arrested by Clackamas County Sheriff's Office on May 15, 2020

Rapist Malachi Williams in contempt for 'uncontrolled' behaviour...

Bradford County Court News 4/7/2013

Download New Album: Wizkid – Morayo (Full Album)