Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 170613

Node 2 cman crashes ..... will not come back without reboot

$
0
0
This has started the last week.

root@proxmox2:~# service cman stop
Stopping cluster:
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... Timed-out waiting for cluster
[FAILED]

only way to get cman to work is to reboot the server...


#########

root@proxmox2:~# clustat
Cluster Status for FL-Cluster @ Mon Feb 18 14:57:28 2013
Member Status: Quorate


Member Name ID Status
------ ---- ---- ------
proxmox11 1 Online
proxmox2 2 Online, Local
proxmox3a 3 Online
proxmox4 4 Online
poxmox5 5 Online
proxmox6 6 Offline
proxmox7 7 Online
proxmox8 8 Online
proxmox9 9 Online
Proxmox10 10 Online
proxmox1a 11 Online


clustat shows as being quorate and part of the cluster but the web interface does not show the same.


#########


restarting



  • pvestatd
  • pvedaemon
  • cman (fails will not stop)
  • pve-cluster


does not solve the issue.


#########


root@proxmox2:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


#########

I do not believe this service is dying on its own. We get hit with enough DDoS for me to say that this may be due to the node being hit by a DDoS. I can say that when this happens there is no way of stopping the cman service and restarting it

I have tried using top command to find the corosync process and trying to kill it with a signal 15 but this does not stop / terminate the corosync service

#########

anyone with any helpful knowledge let me know.

thanks

Viewing all articles
Browse latest Browse all 170613

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>