You experience higher than expected CPU or memory usage on Microsoft Azure virtual machines (VMs) that were recently deployed on computers that are driven by Intel Skylake processors. According to Intel, this change affects VM performance and overall workload or application execution.
The issue is caused by an increase in the "pause" instruction delay for Intel Skylake processors. You may notice this issue particularly in Microsoft .NET Framework applications. This is because the Pause Latency change affects long and very long spin-wait loops that are common in .NET Framework.
In the most recent Skylake microarchitecture, Intel increased the Pause Latency value to up to 140 cycles. In earlier-generation microarchitecture, the Pause Latency value is about 10 cycles. According to Intel, this change was made to improve resource sharing.
For more information about the change and its effects, see section 8.4.7 of the following Intel PDF document:
Follow the steps in this section carefully. Serious problems might occur if you modify the registry incorrectly. Before you modify it, back up the registry for restoration in case problems occur.
Fix for this issue
To fix this issue, install .NET Framework October 2018 Security and Quality RollUp.
Note In .NET Framework 4.8, the fix is enabled by default. In .NET Framework 4.6.x and 4.7.x, the fix is disabled by default and can be manually enabled.
To enable the fix for the pause delay on Skylake processors, start Registry Editor, and add the Thread_NormalizeSpinWait key as a DWORD value to the following subkey:
Value Name: Thread_NormalizeSpinWait
Value data: 1
Note Other customer applications may also be affected by the timer configuration, even though this setting is not enabled by default in any version of .NET Framework. If the workload performance is still affected after the Pause Latency change, consider whether timers are a significant source of lock contention. If you determine that this is true, go to the "Fix for the timers" section.
Fix for the timers
To manually enable the fix, add the Switch.System.Threading.UseNetCoreTimer key as a String value to the following subkey:
Value Name: Switch.System.Threading.UseNetCoreTimer
Value data: true
For more information about timers, see the "AppContext for library consumers” section of the following Microsoft Docs article:
Frequently asked questions
Q1: Does this change cause any harm if we also have UseNetCoreTimer enabled on all kinds of hardware?
A1: The timer fix is not currently enabled by default in any version of .NET Framework. We do not recommend that you change the default setting at the local level.
Q2: Are there any other known issues caused by the Pause Latency change in Skylake?
A2: The new Pause Latency measurement also consumes additional CPU time during startup. Typically, the value is about 10 ms of CPU time. The increased duration is considered to be necessary to get more reliable measurements and improve the ability to fix the issue. However, .NET Framework applications may also be short-running tools. The frequent use of such tools may cause greater CPU usage than before the fix was applied. This was considered to be an acceptable tradeoff in order to fix a larger problem and enable the fix by default in .NET Framework 4.8.
Q3: Is the Skylake Pause Latency fix guaranteed to solve my issue?
A3: No, the fix is not guaranteed. There could be other, unrelated elements outside this issue that affect specific workload performance. The effectiveness of the fix is gated on measurement quality. There are bounds in use to make sure that we don’t overscale spin counts in .NET Framework. However, bad measurements can occur when the VM is heavily loaded. This can prevent the fix from being effective. In the worst case (excluding the tradeoff that is mentioned in A2), this situation would be similar to the fix not being applied.
Q4: Do we have any guidance for support engineers about how we can detect that any perceived performance issue is caused by this change?
A4: You can determine this by profiling the application that's used. Comparing profiles that have similar loads between a pre–Skylake-based VM and a Skylake-based VM may show much more time relatively spent in clr!AwareLock::Contention on the Skylake VM. That would indicate that the pause delay fix would be useful if the VM runs on a Skylake processor.
For the timer fix, the call stack would show that clr!AwareLock::Contention is called by mscorlib.ni!System.Threading.TimerQueueTimer.Fire(). If the Fire() method or other methods on TimerQueueTimer are the primary source of contention, this would indicate that the timer fix would help.
It is also possible to monitor lock contention rates by using Performance Monitor. For more information, see the “Contention Rate / Sec” and “Total # of Contentions” entries for .NET CLR LocksAndThreads in the "Lock and thread performance counters" section of the following Microsoft Docs article: