Assume that you have Microsoft SQL Server 2014, 2016 or 2017 installed. You may experience one or more of the following issues:
- The SQL Server instance appears unresponsive and a "Non-yielding Scheduler" error occurs. You may have to restart the server to recover.
- Rollback of a transaction may take a long time to complete. In most cases restarting the instance will allow the database to recover much faster than the rollback. Note that there are many reasons a rollback may take a long time to complete, see the "More Information" section below for details on monitoring rollbacks before attempting to restart.
- You may see high waits on spinlocks such as SOS_OBJECT_STORE.
There are many reasons why a rollback can take a long time such as a long-running transaction, a large number of VLFs in the transaction log file, slow I/O etc. In order to verify that the issue described in this article is the root cause of a slow rollback, we suggest that the following techniques be used to monitor for the progress of the rollback operation:
- From sys.dm_exec_requests, identify the session_id whose command is set to "KILLED/ROLLBACK" and ensure that the session is accumulating both IO and CPU time indicating progress. If IO is not changing, then it may be an indication that you are encountering the issue described in this article.
- Query sys.dm_tran_database_transactions to identify the current state of the rollback using a query like the following:
SELECT getdate() as CurrentTime, database_transaction_next_undo_lsn,database_transaction_begin_lsn,t.transaction_id,database_transaction_begin_time,database_transaction_log_record_count,db_name(t.database_id)
FROM sys.dm_tran_database_transactions t
JOIN sys.dm_exec_requests s
WHERE t.database_id=db_id('<Database Name') and s.session_id=<Session_id performing the rollback operation>
In the above query,
database_transaction_next_undo_lsn is the LSN of the next record to undo. database_transaction_begin_lsn is the LSN of the begin record for the transaction in the transaction log.
database_transaction_next_undo_lsn should be decreasing with each snapshot of this query. Rollback will complete successfully when the database_transaction_next_undo_lsn reaches database_transaction_begin_lsn.
The goal here is to take a few snapshots of the previous query within a predetermined interval and then use the delta of the LSNs processed in database_transaction_next_undo_lsn within that interval and extrapolate the time taken in order to estimate the time it will take for the database_transaction_next_undo_lsn to reach the database_transaction_begin_lsn.
If the rollback is progressing at a decent rate between each snapshot, we suggest that the rollback be allowed to complete on its own without restarting the SQL Server instance.
See the following articles for more information on long-running recovery:
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.