Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 171654

Cluster constantly loosing quorum after node removal

$
0
0
Hi,

Yesterday I removed a node from my cluster. It shrank from six to five nodes. First, I have shutdown the node to be removed (it had no vms configured on it), then I ran 'pvecm delnode proxmox1-2`.

Since I removed that node, my cluster keeps falling apart. Restarting all cluster-related services works for a while, but after a few minutes one node looses quorum, and the rest follows. There is no logic in which node looses quorum first.

Nothing changed network-wise, so multicast issues seem unlikely. Not all nodes are running the exact same version of Proxmox, but that wasn't an issue before the noderemoval, so I don't expect that to be an issue either.

What I do see which is odd, are the following messages:
Jul 25 12:50:39 proxmox1-99 rrdcached[26823]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-storage/proxmox1-99/zstore-proxmox1) failed with status -1. (/var/lib/rrdcached/db/pve2-storage/proxmox1-99/zstore-proxmox1: illegal attempt to update using time 1406285127 when last update time is 1406285227 (minimum one second step))

Those messages show up on all nodes, for different RRD's. Restarting rrdcached doesn't help.

I'm kinda lost.. So any hit with the cluebat is much appreciated..

Viewing all articles
Browse latest Browse all 171654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>