Hi all,
I have some problem with my Proxmox cluster and DRBD between it. I set all and it's fine. Cluster working perfect. Migration, backups all VM. Storage i have on LVM on DRBD. Every vgdrbd has 1TB. But after some time i have problem with drbd:
In dmesg i see:
My drbd configuration looks like this:
- global_common.conf
And resource like this: r0.res
I have tried a lot of stuff. But i don't have any idea now. What happened and why? Do you have some idea? Maybe disk is broken on servers?
I will be very grateful for help and answer.
Best,
Rafal
I have some problem with my Proxmox cluster and DRBD between it. I set all and it's fine. Cluster working perfect. Migration, backups all VM. Storage i have on LVM on DRBD. Every vgdrbd has 1TB. But after some time i have problem with drbd:
Code:
0:r0 Connected Primary/Primary UpToDate/Diskless C r----- lvm-pv: drbdvg0 931.29g 861.00g
1:r1 Connected Primary/Primary UpToDate/UpToDate C r----- lvm-pv: drbdvg1 931.29g 0g
Code:
block drbd0: Starting worker thread (from cqueue [2626])
block drbd0: open("/dev/sdb1") failed with -16
block drbd0: drbd_bm_resize called with capacity == 0
block drbd0: worker terminated
block drbd0: Terminating worker thread
block drbd1: Starting worker thread (from cqueue [2626])
block drbd1: disk( Diskless -> Attaching )
block drbd1: Found 4 transactions (70 active extents) in activity log.
block drbd1: Method to ensure write ordering: barrier
block drbd1: max BIO size = 131072
block drbd1: drbd_bm_resize called with capacity == 1953064672
block drbd1: resync bitmap: bits=244133084 words=3814580 pages=7451
block drbd1: size = 931 GB (976532336 KB)
block drbd1: bitmap READ of 7451 pages took 37 jiffies
block drbd1: recounting of set bits took additional 36 jiffies
block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd1: disk( Attaching -> UpToDate )
block drbd1: attached to UUIDs 70A8363B4F73C19E:0000000000000000:43AC9F762F8AF4F7:43AB9F762F8AF4F7
block drbd0: Starting worker thread (from cqueue [2626])
block drbd0: conn( StandAlone -> Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [2661])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -> WFConnection )
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [2649])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 96
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [2670])
block drbd0: data-integrity-alg: <not-used>
block drbd0: max BIO size = 4096
block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
block drbd1: Handshake successful: Agreed network protocol version 96
block drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [2674])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self 70A8363B4F73C19E:0000000000000000:43AC9F762F8AF4F7:43AB9F762F8AF4F7 bits:0 flags:0
block drbd1: peer 7D727C5A8840067D:70A8363B4F73C19F:43AC9F762F8AF4F7:43AB9F762F8AF4F7 bits:0 flags:0
block drbd1: uuid_compare()=-1 by rule 50
block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
block drbd0: role( Secondary -> Primary )
block drbd1: role( Secondary -> Primary )
DLM (built Oct 14 2013 08:10:28) installed
block drbd1: conn( WFBitMapT -> WFSyncUUID )
block drbd1: updated sync uuid 70A9363B4F73C19F:0000000000000000:43AC9F762F8AF4F7:43AB9F762F8AF4F7
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: updated UUIDs 7D727C5A8840067D:0000000000000000:70A9363B4F73C19F:70A8363B4F73C19F
block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
block drbd1: bitmap WRITE of 7451 pages took 20 jiffies
block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
ip_tables: (C) 2000-2006 Netfilter Core Team
- global_common.conf
Code:
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
# The following 3 handlers were disabled due to #576511.
# Please check the DRBD manual and enable them, if they make sense in your setup.
# pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
# local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
wfc-timeout 15;
degr-wfc-timeout 15;
become-primary-on both;
}
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
cram-hmac-alg sha1;
shared-secret "my-secret";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
rate 1000M;
}
}
Code:
# This is the resource used for the shared GFS2 partition.
resource r0 {
# This is the block device path.
device /dev/drbd0;
# We'll use the normal internal metadisk (takes about 32MB/TB)
meta-disk internal;
# This is the `uname -n` of the first node
on node1 {
# The 'address' has to be the IP, not a hostname. This is the
# node's SN (bond1) IP. The port number must be unique amoung
# resources.
address 10.0.0.12:7788;
# This is the block device backing this resource on this node.
disk /dev/sdb1;
}
# Now the same information again for the second node.
on node2 {
address 10.0.0.13:7788;
disk /dev/sdb1;
}
}
I will be very grateful for help and answer.
Best,
Rafal