Intermittent Hyper-V Network Hiccups


Castaway Kid

I have a failover cluster with 4 nodes running Server 2019. Within that cluster are 19 VMs running a range of Server 2012 R2, 2016 and 2019. I just upgraded the cluster to 2019 from 2016 last week following the rolling upgrade plan. I did an in-place upgrade of each node instead of wiping the drives.

At this point I have three or four VMs which are experiencing intermittent network issues. The most visible symptoms are pings which randomly drop and my PRTG probes failing to connect to the central server every five to ten minutes. I also experience occasional Remote Desktop reconnecting screens to those VMs. Those VMs are running all three of the aforementioned OS versions.

The host nodes are Dell PowerEdge R440s. I have the virtual switches assigned to an Intel I350 NIC to avoid the potential problems with the on-board Broadcom NICs. I've set the BelowTenGigVmqEnabled registry key on all four host nodes. I've also set nearly all of the VMs to static MACs, including the aforementioned troublesome VMs. The virtual switches are configured with SoftwareRscEnabled=false. The troublesome VMs are not all on the same node. I tried disabling the TCP Offload setting with the DisableTaskOffload registry key on one of them, but it hasn't made a difference.

I feel like I've checked about every suggestion out there. I know the static MAC option did solve problems for several of the VMs. The BelowTenGigVmqEnabled solved problems for a couple of the other VMs. I feel like I'm hitting a brick wall when it comes to these last three or four.

I was not seeing any of these problems while the hosts were all 2016. Of course, matters need to be complicated by the fact that I upgraded to 2019 and then almost immediately hit the monthly patch cycle, so the VMs didn't get much of a chance to run on the newly upgraded hosts before they all went through patching.

Continue reading...
Top Bottom