Dell performance profiles and Linux

A few weeks ago we discovered that the performance of our Windows 2012 guest on KVM hypervisors was very bad and it was caused by the "Performance per Watt (DAPC)" profile in the BIOS of our Dell m620 servers. After that, I got curious and started to test with other systems: Linux guests and bare metal. According to the Dell BIOS Performance and Power Tuning documentation, the difference between the performance per watt (dapc) and performance per watt (os) profile should be only a few percent:
"The DAPC System Profile represents the best performance-per-watt combination for this benchmark across all workload intervals, using the least amount of power while offering maximum performance levels within 2% of that provided by the Performance System Profile." But we already found bigger differences before, so you never know...

We started with an Euler calculation because we had a similar benchmark on the Windows VMs. The test was done in a VM which we migrated between two hypervisors with a different system profile. We also ran the test bare metal on the hypervisor. The difference between both profiles was not very impressive, something like 5% for a calculcation of about 90 seconds. But I also noticed that for shorter calculations, the difference was more like 30-50%. Since we're running webservices with most of the time rather short CPU spikes for a request, I thought it was worth trying to switch the system profiles on our other hypervisors too.

The results are impressive. It took a few days to change the profiles for all our hypervisors, but in the end there was about 30% performance increase on the platform. I attached the results for 2 CPU intensive J2EE webservices. My customer collects response times of the services with an ELK stack. The results were aggregated and compared before, during and after the changes (marked on the graph). I also attached a Munin CPU and Kibina resonse time graph of a VM running an old standalone webservice (cgi client + daemon written in C) which serves mostly short requests. The last two hours on the graph the service is running on an hypervisor with the performance per watt (os) profile and the result is also at least 30% improvement.

The downside? Your servers will be using more power. The exact amount depends on the existing load, but we noticed a 20-30 Watt increase per socket for Xeon E5 v2 CPUs. But then again, if it saves you an extra server for every four servers you have, you can probably live with that. And if you're running enterprise software on the servers, like my customer does, the cost of the licenses is most of the time higher than the cost of the hardware, so the "win" is even bigger.

Why? I'm not a kernel developer or a Dell firmware specialist, but my guess is that that somehow the operating system is not able to scale up the CPU frequency quickly enough to get maximum performance with the Dell DAPC BIOS profile. When the kernel manages the CPU frequency it will probably keep the clock speed higher, that explains the higher power usage and the fact that CPU usage seems to drop in the Munin graphs (it doesn't really drop, but the maximum is higher). The benchmark software used by Dell to compare the profiles probably runs on Windows and I guess benchmark software will also use long running test with the CPU frequency at max from the start. Other explanations are always welcome. I also sent this information to Red Hat and Dell, but till now I didn't get any useful explanation from their side.