hi,
At the middle of cluster activity i received this messages: (cluster is 3 node with SAN ... GFS2 filesystem)
log messages on USBAck-prox2:
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [QUORUM] Members[2]: 2 3
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 21 13:06:41 USBack-prox2 rgmanager[4130]: State change: USBack-prox1 DOWN
Feb 21 13:06:41 USBack-prox2 kernel: dlm: closing connection to node 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] chosen downlist from node r(0) ip(--.--.--.22)
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 21 13:06:41 USBack-prox2 kernel: GFS2: fsid=USBackCluster:VMStorage1.0: jid=1: Trying to acquire journal lock...
Feb 21 13:06:41 USBack-prox2 kernel: GFS2: fsid=USBackCluster:VMStorage2.0: jid=1: Trying to acquire journal lock...
Feb 21 13:06:51 USBack-prox2 fenced[3957]: fencing node USBack-prox1
Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 dev 0.0 agent fence_ipmilan result: error from agent
Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 failed
Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
Feb 21 13:06:55 USBack-prox2 rgmanager[4130]: State change: USBack-prox1 UP
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 2
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 0
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 0
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] chosen downlist from node r(0) ip(--.--.--.21)
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 3a95f87400000000 protocol
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 1e7ff52100000001 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 22221a7000000002 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 419ac24100000003 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 3804823e00000004 start
-------------------------------------------------
Then GFS2 generates error logs (Activities blocked).
Logs of cisco switch (Time is UTC):
Feb 21 09:37:02.375: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to down
Feb 21 09:37:02.459: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/4, changed state to down
Feb 21 09:37:03.382: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to down
Feb 21 09:37:03.541: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, changed state to down
Feb 21 09:37:07.283: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to up
Feb 21 09:37:07.350: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, changed state to up
Feb 21 09:37:08.289: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to up
Feb 21 09:37:09.472: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/4, changed state to up
Feb 21 09:40:20.045: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to down
Feb 21 09:40:21.043: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to down
Feb 21 09:40:23.401: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to up
At the middle of cluster activity i received this messages: (cluster is 3 node with SAN ... GFS2 filesystem)
log messages on USBAck-prox2:
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [QUORUM] Members[2]: 2 3
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 21 13:06:41 USBack-prox2 rgmanager[4130]: State change: USBack-prox1 DOWN
Feb 21 13:06:41 USBack-prox2 kernel: dlm: closing connection to node 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 1
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] chosen downlist from node r(0) ip(--.--.--.22)
Feb 21 13:06:41 USBack-prox2 corosync[3911]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 21 13:06:41 USBack-prox2 kernel: GFS2: fsid=USBackCluster:VMStorage1.0: jid=1: Trying to acquire journal lock...
Feb 21 13:06:41 USBack-prox2 kernel: GFS2: fsid=USBackCluster:VMStorage2.0: jid=1: Trying to acquire journal lock...
Feb 21 13:06:51 USBack-prox2 fenced[3957]: fencing node USBack-prox1
Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 dev 0.0 agent fence_ipmilan result: error from agent
Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 failed
Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
Feb 21 13:06:55 USBack-prox2 rgmanager[4130]: State change: USBack-prox1 UP
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 2
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 0
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received left_list: 0
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] chosen downlist from node r(0) ip(--.--.--.21)
Feb 21 13:06:55 USBack-prox2 corosync[3911]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 3a95f87400000000 protocol
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 1e7ff52100000001 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 22221a7000000002 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 419ac24100000003 start
Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined error 12 handle 3804823e00000004 start
-------------------------------------------------
Then GFS2 generates error logs (Activities blocked).
Logs of cisco switch (Time is UTC):
Feb 21 09:37:02.375: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to down
Feb 21 09:37:02.459: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/4, changed state to down
Feb 21 09:37:03.382: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to down
Feb 21 09:37:03.541: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, changed state to down
Feb 21 09:37:07.283: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to up
Feb 21 09:37:07.350: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, changed state to up
Feb 21 09:37:08.289: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to up
Feb 21 09:37:09.472: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/4, changed state to up
Feb 21 09:40:20.045: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/11, changed state to down
Feb 21 09:40:21.043: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to down
Feb 21 09:40:23.401: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, changed state to up