KVM Network Performance : TSO and GSO – Turn it off

Over the last few weeks I’ve been looking into tso and gso, mostly various issues encountered with offloading on virtual environments such as KVM and Xen.

The problems span all hypervisors seemingly:

In addition, in my past work, we would have to turn sg offloading off to get any proper internal networking performance to work properly on Xen. Otherwise on the same dom0, domU <-> domU would result in speeds of 20 KB/s.

My favorite blog, ‘Lessons from the trenches‘ even has encountered the ‘death packet’ issues which resulted in CentOS guests networking breaking with tso and gso on. They’ve properly shut it off at the host level, and suggested users do so as well to avoid issues.

Finally, Red Hat suggests that if you’re encountering any type of performance issue on Virtualized guests that are running VirtIO, you disable tso and gso on the host-node as best practice:

Network Performance Issues
If you experience low performance with the virtio network drivers, verify the setting for the GSO and TSO features on the host system. The virtio network drivers require that the GSO and TSO options are disabled for optimal performance.
 
Verify the status of the GSO and TSO settings, use the command on the host (replacing interface with the network interface used by the guest):
# ethtool -k interface

Disable the GSO and TSO options with the following commands on the host:
# ethtool -K interface gso off
# ethtool -K interface tso off

I’ve gotten feedback about gso not being an issue, and yes – the chksums are incorrect since they’re calculated later, but this is all extra strain on a system and that’s not necessary, upon scaling at 1Gbps+ of networking, your incorrect checksums add up to poor performance. Add internal networking and an ethernet adapter or two, and you’re suffering from performance issues – in the least.

Worse, if you have any equipment that does not have MSS or MTU fragmenting properly handled, with GSO enabled you will have erratic MTUs higher than set by MSS or MTU and have network issues, such as 2900 in this article.

In short, turn GSO and TSO off at the host-node level, especially br0. It’s best practice, and the bug reports of TSO and GSO causing instability on hypervisors, amongst other offloading such as sg means you should stick with Red Hat’s advice, everyone’s findings, and simply disable it on the host-node interfaces, then troubleshoot if you’re still having trouble. In the least, a known feature causing issues that has no advantages to guests will be gone.  

Also, it’s not just to hide the checksum errors from UDP, but because it’s likely causing hard to duplicate network issues across your host-node if you’re stumped, be it UDP performance or a dying internal network.

 

 

Simple Way to Keep an eye on top VM usage: virt-top

Sort by top memory:
Sort by top net out:
Sort by top net in: 
Sort by top CPU usage 
virt-top -o mem
virt-top -o nettx
virt-top -o netrx
virt-top
You can also use just: virt-top and use the numerical: 0 (cpu usage),  2 (network usage), and  3 (disk usage) for fast access. 
Example of flags passed to view disk usage and network usage in bytes:
 
virt-top -3 --block-in-bytes (who's abusing the disk?)
 
virt-top -o nettx --block-in-bytes (who's outbound DoS?)

Output 1 minute of the Top Memory users via CSV output for use in graphing (ONLY MEMORY) 
[~]# virt-top --end-time +00:01:00 -o mem --script --no-csv-cpu --no-csv-block --no-csv-net --csv x.txt

CSV output 3 minutes of the of ONLY the top Network inbound, output to CSV for graphing (Only NET IN)
[~]# virt-top --end-time +00:03:00 -o netrx --script --no-csv-cpu --no-csv-mem --no-csv-block --csv top-inbound.txt

 


Advanced, CSV output the top block device usage (by read and write bytes for easy reading instead of cycles) and output after a 3 minute query. (ONLY BLK USAGE)

 

[~]# virt-top -3 --block-in-bytes --end-time +00:03:00 --script --no-csv-cpu --no-csv-mem --no-csv-net --csv drive-io.txt