Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 170711

Concurrent migration fails because port is in use

$
0
0
Hello everybody,


We have a two-node-cluster with Proxmox 2.3-13 using DRBD. Fail-over works, but after the failed node is back and unfenced, the second migration back to the original node fails:


Code:

task started by HA resource agent
May 07 13:53:21 starting migration of VM 103 to node 'vhost2' (10.0.0.102)
May 07 13:53:21 copying disk images
May 07 13:53:21 starting VM 103 on remote node 'vhost2'
May 07 13:53:23 starting migration tunnel
bind: Address already in use


channel_setup_fwd_listener: cannot listen to port: 60000


Could not request local forwarding.


May 07 13:53:24 starting online/live migration on port 60000
May 07 13:53:24 migrate_set_speed: 8589934592
May 07 13:53:24 migrate_set_downtime: 0.1
May 07 13:53:26 ERROR: online migrate failure - aborting
May 07 13:53:26 aborting phase 2 - cleanup resources
May 07 13:53:26 migrate_cancel
May 07 14:03:30 ERROR: migration finished with problems (duration 00:10:10)
TASK ERROR: migration problems

The first migration worked without problems. There seems to be a race condition if two migrations are concurrent, the port is not incremented. Is there any option to delay the migration by a few seconds? Or is another work-around available? Any help is appreciated because we really need to re-balance the VMs after a node comes online.

Kind regards,
Chris


This is our cluster.conf:

Code:

<?xml version="1.0"?>
<cluster config_version="9" name="testcluster">
  <cman two_node="1" expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.5" name="switch_a1" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.6" name="switch_a2" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.7" name="switch_b1" snmp_version="2c"/>
    <fencedevice agent="fence_ifmib" community="public" ipaddr="10.0.0.8" name="switch_b2" snmp_version="2c"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="vhost1" nodeid="1" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="switch_b1" port="35"/>
          <device action="off" name="switch_b2" port="38"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="vhost2" nodeid="2" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="switch_a1" port="37"/>
          <device action="off" name="switch_a2" port="42"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <failoverdomains>
      <failoverdomain name="domain1" nofailback="0">
        <failoverdomainnode name="vhost1" priority="1"/>
      </failoverdomain>
      <failoverdomain name="domain2" nofailback="0">
        <failoverdomainnode name="vhost2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <pvevm autostart="1" vmid="102" domain="domain1" />
    <pvevm autostart="1" vmid="103" domain="domain2"/>
    <pvevm autostart="1" vmid="106" domain="domain1" />
    <pvevm autostart="1" vmid="107" domain="domain2" />
  </rm>
</cluster>


Viewing all articles
Browse latest Browse all 170711

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>