FIX: "Non-yielding Scheduler" error and SQL Server appears unresponsive in SQL Server 2014, 2016 and 2017

Applies to: SQL Server 2016 DeveloperSQL Server 2016 EnterpriseSQL Server 2016 Enterprise Core

Symptoms


Assume that you have Microsoft SQL Server 2014, 2016 or 2017 installed.  You may experience one or more of the following issues:

  • The SQL Server instance appears unresponsive and a "Non-yielding Scheduler" error occurs. You may have to restart the server to recover.
  • Rollback of a transaction may take a long time to complete. In most cases restarting the instance will allow the database to recover much faster than the rollback. Note that there are many reasons a rollback may take a long time to complete, see the "More Information" section below for details on monitoring rollbacks before attempting to restart.
  • You may see high waits on spinlocks such as SOS_OBJECT_STORE.

Resolution


This problem is fixed in the following cumulative updates for SQL Server:

       Cumulative Update 9 for SQL Server 2017

       Cumulative Update 2 for SQL Server 2016 SP2

Service pack information for SQL Server

This update is fixed in the following service pack for SQL Server:

Service Pack 3 for SQL Server 2014

More Information


There are many reasons why a rollback can take a long time such as a long-running transaction, a large number of VLFs in the transaction log file, slow I/O etc.  In order to verify that the issue described in this article is the root cause of a slow rollback, we suggest that the following techniques be used to monitor for the progress of the rollback operation:

  • From sys.dm_exec_requests, identify the session_id whose command is set to  "KILLED/ROLLBACK" and ensure that the session is accumulating both IO and CPU time indicating progress. If IO is not changing, then it may be an indication that you are encountering the issue described in this article.
  • Query sys.dm_tran_database_transactions to identify the current state of the rollback using a query like the following:​
​ ​

SELECT getdate() as CurrentTime, database_transaction_next_undo_lsn,database_transaction_begin_lsn,t.transaction_id,database_transaction_begin_time,database_transaction_log_record_count,db_name(t.database_id)

FROM sys.dm_tran_database_transactions t

JOIN sys.dm_exec_requests s
   ON t.transaction_id=s.transaction_id

WHERE t.database_id=db_id('<Database Name') and s.session_id=<Session_id performing the rollback operation>

Note:

In the above query,

database_transaction_next_undo_lsn is the LSN of the next record to undo. database_transaction_begin_lsn is the LSN of the begin record for the transaction in the transaction log.

database_transaction_next_undo_lsn should be decreasing with each snapshot of this query. Rollback will complete successfully when the database_transaction_next_undo_lsn reaches database_transaction_begin_lsn.

The goal here is to take a few snapshots of the previous query within a predetermined interval and then use the delta of the LSNs processed in database_transaction_next_undo_lsn within that interval and extrapolate the time taken in order to estimate the time it will take for the database_transaction_next_undo_lsn to reach the database_transaction_begin_lsn.

If the rollback is progressing at a decent rate between each snapshot, we suggest that the rollback be allowed to complete on its own without restarting the SQL Server instance.

See the following articles for more information on long-running recovery:

Status


Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.

References


Learn about the terminology that Microsoft uses to describe software updates.