Hi,
I have an almost idle test cluster with 12 nodes (pve-7.4). With an uptime of a year or so without issues, last week I noticed in web GUI, that only the local node information showed up. I logged in and saw on three randomly picked nodes that corosync had 100%CPU, apparently busy-loop over some network error; the log showed many "Retransmit List:" entries. Today I planned to take a look. The load is gone, as corosyncs have been out-of-memory-killed and nodes are "Flags...
Read more
I have an almost idle test cluster with 12 nodes (pve-7.4). With an uptime of a year or so without issues, last week I noticed in web GUI, that only the local node information showed up. I logged in and saw on three randomly picked nodes that corosync had 100%CPU, apparently busy-loop over some network error; the log showed many "Retransmit List:" entries. Today I planned to take a look. The load is gone, as corosyncs have been out-of-memory-killed and nodes are "Flags...
Read more