Quantcast
Channel: Proxmox Support Forum
Viewing all articles
Browse latest Browse all 170713

KVM on top of DRBD and out of sync: long term investigation results

$
0
0
Long time ago after when I found a guide how to configure Proxmox two node cluster + DRBD I was really happy. We only needed to nodes for online migration and quick recovery after hardware failure. I saw that as a solution that could be widely used. And it worked for a while.

When I noticed that online migration sometimes fails for no visible reason. While reading DRBD documentation I found a recommendation to check DRBD synchronization consistency at least ones a month and I started to do. Surprisingly I found that there were new out of sync sectors every week.

I went deeper and found that:
- most of the time out of sync happen on a swap space of Linux VMs (not critical for primary/secondary mode but critlical for primary/primary as can cause memory corruption)
- sometimes (quite rarely) out of sync happen for Windows VMs
- out of sync never happen for ext4 volumes of VMs

At the beginning I was thinking it was hardware issue and we tried to disable any kind of offload, disable rr-bonding and we even asked Dell to replace hard drives (we assumed there was a firware issue). However nothing helped.

Finally we found (thanks for Lars Ellenberg) that KVM can change buffers while data in-flight if write cache for a particular virtual hard drive is enabled. Switching write cache off solves (or work arounds) the issue.

So far I have the following recommendations for KVM on top of DRBD:
- use writethrough or directsync for all drives of all VMs on DRBD (means no write cache)
- use hardware RAID with write cache and BBU (this is extremely necessary as we disabled write cache for VMs)
- you can enable write cache (modes other than writethrough or directsync) for some virtual drives that have reliable barrier support, for example, if a particular drive has only ext4 partition with barrier enabled and no swap

Any more ideas and suggestions are welcome.

More information here: http://www.gossamer-threads.com/lists/drbd/users/25227

Viewing all articles
Browse latest Browse all 170713

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>