Hello,
I've set up two-node HA cluster with qdisk (3 votes quorum) using a managed switch as fence_ifmib agent. Everything works great so far except there's a little inconvinience: any time a node gets fenced it reboots about in a minute (one time).
The bad thing here is that I've got my qdisk (which is an iSCSI target) automatically attached at startup which, of course, cannot be done when the node is fenced. At startup, it gets "quorum dissolved" message and strangely after that rgmanager stops and I cannot manually start it without qdisk - it crashes immediately.
So, to reanimate a fenced node I need to:
1) ssh to the alive node and execute
2) ssh to the dead node and manually connect iSCSI device by iscsiadm (or reboot the machine for doing this at startup)
3) on the second node execute
/etc/init.d/cman reload
. After that rgmanager starts succesfully.
That's not a hard thing to do but I'm still surprised why it happens at all. Why rgmanager cannot run without qdisk? (I've got just my pvevm's in cluster.conf <rg> section) What if suddenly my qdisk device dies and both rgmanager services on both nodes will crash simultaneously? Why there's a reboot at all, can I disable it and let a fenced node stay calm and alive waiting for examination and fence_node -U node2?
I thought there might be a problem with "allow_kill="1" in cluster.conf qdisk section, but I tried both 1 and 0, the outcome seems to be the same.
Thanks.
I've set up two-node HA cluster with qdisk (3 votes quorum) using a managed switch as fence_ifmib agent. Everything works great so far except there's a little inconvinience: any time a node gets fenced it reboots about in a minute (one time).
The bad thing here is that I've got my qdisk (which is an iSCSI target) automatically attached at startup which, of course, cannot be done when the node is fenced. At startup, it gets "quorum dissolved" message and strangely after that rgmanager stops and I cannot manually start it without qdisk - it crashes immediately.
So, to reanimate a fenced node I need to:
1) ssh to the alive node and execute
Code:
fence_node -U node2
3) on the second node execute
Quote:
/etc/init.d/cman reload
That's not a hard thing to do but I'm still surprised why it happens at all. Why rgmanager cannot run without qdisk? (I've got just my pvevm's in cluster.conf <rg> section) What if suddenly my qdisk device dies and both rgmanager services on both nodes will crash simultaneously? Why there's a reboot at all, can I disable it and let a fenced node stay calm and alive waiting for examination and fence_node -U node2?
I thought there might be a problem with "allow_kill="1" in cluster.conf qdisk section, but I tried both 1 and 0, the outcome seems to be the same.
Thanks.