Windows 7 x64 VMs crashing randomly during process termination

We have a cluster of 5 proxmox nodes hosting a mix of Linux and Windows 7 x64 VMs which we use as our automated software build and test environment. When starting build and test jobs from our Jenkins server, some of the build and test jobs on the Windows 7 VM fail very regularly, because the Windows 7 VMs crash at random points during the job. This is quite annoying.

The Windows VMs use virtio drivers for both disk and network, the disks are stored on the local drive, with writeback caching. I've tried two versions of the virtio drivers (0.1-74 and 0.1-59), but haven't seen any difference. I've disabled memory ballooning on all VMs.

pveversion -v produces:

Code:

proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)

pve-manager: 3.2-1 (running version: 3.2-1/1933730b)

pve-kernel-2.6.32-27-pve: 2.6.32-121

pve-kernel-2.6.32-26-pve: 2.6.32-114

lvm2: 2.02.98-pve4

clvm: 2.02.98-pve4

corosync-pve: 1.4.5-1

openais-pve: 1.1.4-3

libqb0: 0.11.1-2

redhat-cluster-pve: 3.2.0-2

resource-agents-pve: 3.9.2-4

fence-agents-pve: 4.0.5-1

pve-cluster: 3.0-12

qemu-server: 3.1-15

pve-firmware: 1.1-2

libpve-common-perl: 3.0-14

libpve-access-control: 3.0-11

libpve-storage-perl: 3.0-19

pve-libspice-server1: 0.12.4-3

vncterm: 1.1-6

vzctl: 4.0-1pve4

vzprocps: not correctly installed

vzquota: 3.1-2

pve-qemu-kvm: 1.7-4

ksm-control-daemon: 1.1-1

glusterfs-client: 3.4.2-1

I've used windbg to examine a number of Windows memory.dmp files that are produced during the VM crashes, and running '!analyze -v' on them, all produce similar output like below: the general pattern seems to be that a PAGE_FAULT_IN_NONPAGED_AREA exception occurs when terminating a process. In the various dump files I've seen this happening to various executables that are used as part of our build and test jobs.

If this is some kind of race condition going on, this would explain why our build and test jobs are good candidates to trigger this exception: during each job a huge number of processes are started and terminated.

Code:

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************





PAGE_FAULT_IN_NONPAGED_AREA (50)

Invalid system memory was referenced.  This cannot be protected by try-except,

it must be protected by a Probe.  Typically the address is just plain bad or it

is pointing at freed memory.

Arguments:

Arg1: fffff680003f7db8, memory referenced.

Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.

Arg3: fffff800026fcdbc, If non-zero, the instruction address which referenced the bad memory

    address.

Arg4: 0000000000000002, (reserved)





Debugging Details:

------------------









READ_ADDRESS:  fffff680003f7db8 





FAULTING_IP: 

nt!MiDeletePageTableHierarchy+9c

fffff800`026fcdbc 498b06          mov     rax,qword ptr [r14]





MM_INTERNAL_CODE:  2





DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT





BUGCHECK_STR:  0x50





PROCESS_NAME:  grep.exe





CURRENT_IRQL:  0





ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre





TRAP_FRAME:  fffff88005378f00 -- (.trap 0xfffff88005378f00)

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=000000fdf6e00000 rbx=0000000000000000 rcx=0000000fffffffff

rdx=0000058000000000 rsi=0000000000000000 rdi=0000000000000000

rip=fffff800026fcdbc rsp=fffff88005379090 rbp=fffffa80058b1200

 r8=0000007ffffffff8  r9=0000098000000000 r10=fffffa8003601b90

r11=fffff88005379170 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up ei ng nz na po cy

nt!MiDeletePageTableHierarchy+0x9c:

fffff800`026fcdbc 498b06          mov     rax,qword ptr [r14] ds:00000000`00000000=????????????????

Resetting default scope





LAST_CONTROL_TRANSFER:  from fffff800027465e4 to fffff800026c9bc0





STACK_TEXT:  

fffff880`05378d98 fffff800`027465e4 : 00000000`00000050 fffff680`003f7db8 00000000`00000000 fffff880`05378f00 : nt!KeBugCheckEx

fffff880`05378da0 fffff800`026c7cee : 00000000`00000000 fffff680`003f7db8 00000000`0008ed00 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x43836

fffff880`05378f00 fffff800`026fcdbc : fffffa80`0299e6b0 00000000`00000001 fffffa80`0302aa80 fffff6fb`40001000 : nt!KiPageFault+0x16e

fffff880`05379090 fffff800`026998b6 : fffff700`01080510 fffffa80`058b1598 fffff700`01080000 fffff8a0`004028e8 : nt!MiDeletePageTableHierarchy+0x9c

fffff880`053791a0 fffff800`0269a892 : fffffa80`058b1200 fffffa80`00000000 fffff8a0`00000025 00000000`00000000 : nt!MiDeleteAddressesInWorkingSet+0x3fb

fffff880`05379a50 fffff800`0299e15a : fffff8a0`0b6cea90 00000000`00000001 00000000`00000000 fffffa80`05621a00 : nt!MmCleanProcessAddressSpace+0x96

fffff880`05379aa0 fffff800`029826b8 : 00000000`c0000005 00000000`00000001 00000000`7efdb000 00000000`00000000 : nt!PspExitThread+0x56a

fffff880`05379ba0 fffff800`026c8e53 : fffffa80`058b1200 00000000`c0000005 fffffa80`05621a00 00000000`7efdf000 : nt!NtTerminateProcess+0x138

fffff880`05379c20 00000000`76ee157a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13

00000000`0008f758 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76ee157a









STACK_COMMAND:  kb





FOLLOWUP_IP: 

nt!MiDeletePageTableHierarchy+9c

fffff800`026fcdbc 498b06          mov     rax,qword ptr [r14]





SYMBOL_STACK_INDEX:  3





SYMBOL_NAME:  nt!MiDeletePageTableHierarchy+9c





FOLLOWUP_NAME:  MachineOwner





MODULE_NAME: nt





DEBUG_FLR_IMAGE_TIMESTAMP:  521ea035





IMAGE_VERSION:  6.1.7601.18247





IMAGE_NAME:  memory_corruption





FAILURE_BUCKET_ID:  X64_0x50_nt!MiDeletePageTableHierarchy+9c





BUCKET_ID:  X64_0x50_nt!MiDeletePageTableHierarchy+9c





ANALYSIS_SOURCE:  KM





FAILURE_ID_HASH_STRING:  km:x64_0x50_nt!mideletepagetablehierarchy+9c





FAILURE_ID_HASH:  {a5101511-63a3-65ce-1b12-16e97aca479e}





Followup: MachineOwner

---------

I would be most grateful if anyone could shed some light on these annoying crashes, or give some configuration change to help prevent them.

Cheers,
Marcel Roelofs

Windows 7 x64 VMs crashing randomly during process termination

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112