Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 171654

"Quorum dissolved" after short network outage due to STP topology change

$
0
0
Three Proxmox 3.4 nodes (all enterprise updates applied) in HA setup.

All have three ethernet ports:
port 1 is connected to the WAN gateway
port 2 is connected to a Netgear GS724T managed switch ("master")
port 3 is connected to another Netgear switch (same model, "slave")

The Netgear switches are also connected together directly using a patch cable.

On each Proxmox node port 2 and 3 are configured as a virtual Linux bridge. The Netgear switches have a low STP priority value (and thus ususally the first Netgear switch becomes root).

In theory, this gives me a fully redundant mesh LAN network, that auto-configures itself.

Problem: When the root switch goes offline (tried with a reboot) it takes a few seconds until the slave Netgear switch becomes the new root, meaning that the whole network is dysfunctional for ~10 seconds.

Proxmox notices this and says "Quorum dissolved", shutting down VMs (strangely no fencing of nodes is started, though)

This leaves the Cluster in an unacceptable state, meaning that the switch is effectively a single point of failure.

Even worse, nodes can't do a normal reboot because of all kinds of probems with the rgmanager meaning that a reset must be done.

What can I do to avoid such a situation? Is there a way to extend Heartbeat timeouts and is it wise to do?


PS: The Netgear switches support RSTP and have it enabled, however the Linux bridges can only do STP.

Viewing all articles
Browse latest Browse all 171654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>