Improvements for SQL Server AlwaysOn Lease Timeout supportability in SQL Server 2012 and 2014

Symptoms
This update includes the following improvements for Microsoft SQL Server AlwaysOn Lease Timeout supportability:
  • The Lease Timeout message now displays and logs the current time and expected renewal time.
  • A new error message was added for lease workers that clearly indicates the reason for Lease Timeout.
  • A new extended event and a new ring buffer for lease workers were added. These clearly indicate the lease stages.
Resolution
The issue was first fixed in the following Service Packs for SQL Server.
For more information about SQL Server 2012 Service Pack 3 (SP3), see bugs that are fixed in SQL Server 2012 Service Pack 3.

About Service packs for SQL Server

Service packs are cumulative. Each new service pack contains all the fixes that are in previous service packs, together with any new fixes. Our recommendation is to apply the latest service pack and the latest cumulative update for that service pack. You do not have to install a previous service pack before you install the latest service pack. Use Table 1 in the following article for finding more information about the latest service pack and latest cumulative update.

How to determine the version, edition and update level of SQL Server and its components
More information
To provide additional insight, new error messages have been added to SQL Server. The following table lists and explains each of them.

ErrorError messageCauseCorrective action
19419The renewal of the lease between availability group '%.*ls' and the Windows Server Failover Cluster failed because the existing lease is no longer valid.The lease worker on the SQL Server side did not get scheduled on time to process event signal from the cluster.Check the CPU utilization on the server as SQL Server lease worker seems to be starving.
19420The availability group '%.*ls' is explicitly asked to stop the lease renewal.The lease renewal is stopping as a part of bringing the availability group offline. This is informational only.
19421The renewal of the lease between availability group '%.*ls' and the Windows Server Failover Cluster failed because renewal didn't happen within lease interval.The lease helper on the cluster side did not signal the SQL Server lease worker on time.Check corresponding availability group resource in WSFC cluster to see if it reported any error.
19422The renewal of the lease between availability group '%.*ls' and the Windows Server Failover Cluster failed because of a windows error with Error code ('%d').The lease worker on SQL Server side failed to renew the lease because of a windows error.Check windows error code and take the corrective action.
19423The lease of availability group '%.*ls' lease is no longer valid to start the lease renewal process.When the lease worker started processing the excess lease time provided by online call the lease was already expired. This might happened because of scheduling issues.Check the CPU utilization on the server as SQL Server lease worker seems to be starving.
19424The lease worker of availability group '%.*ls' is now sleeping the excess lease time (%u ms) supplied during online. This is an informational message only. No user action is required.Informational. Extra online time allotted to starting the lease renewal thread and as part of the availability group online routine.

Example of error 19419: If you use a debugger to attach to SQL Server, it interrupts any servicing of threads in the SQL Server process until you resume the SQL Server process. When you resume SQL Server, the following is reported in the SQL Server error log:

<Date Time> Server Error: 19419, Severity: 16, State: 1.
<Date Time> Server Windows Server Failover Cluster did not receive a process event signal from SQL Server hosting availability group 'ag' within the lease timeout period.
<Date Time> Server Error: 19407, Severity: 16, State: 1.
<Date Time> Server The lease between availability group 'ag' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.
<Date Time> Server AlwaysOn: The local replica of availability group 'ag' is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.
<Date Time> Server The state of the local availability replica in availability group 'ag' has changed from 'PRIMARY_NORMAL' to 'RESOLVING_NORMAL'. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error. For more information, see the availability group dashboard, SQL Server error log, Windows Server Failover Cluster management console or Windows Server Failover Cluster log.

The 19419 error is returned because SQL Server did not respond to the cluster service. You may also receive a lease timeout error message (19407) along with the 19419 error.

Example of error 19424: The following is the excess lease time message that's reported just before the availability group transitions to PRIMARY role:

<Date Time> Server The lease worker of availability group 'ag' is now sleeping the excess lease time (164766 ms) supplied during online. This is an informational message only. No user action is required.
<Date Time> Server The state of the local availability replica in availability group 'ag' has changed from 'PRIMARY_PENDING' to 'PRIMARY_NORMAL'. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error. For more information, see the availability group dashboard, SQL Server error log, Windows Server Failover Cluster management console or Windows Server Failover Cluster log.

The availability_group_lease_expired and hadr_ag_lease_renewal XEvents have been improved, with the addition of data points that provide more information about the condition of the lease. The following table describes the improvements to these XEvents:

XEventNew columnDescription
availability_group_lease_expiredcurrent_timeTime at which the lease expired
availability_group_lease_expirednew_timeoutTime out time, when availability_group_lease_expired is raised, current_time is greater than new_timeout
availability_group_lease_expiredstateLease stages: see Lease Stages table below
hadr_ag_lease_renewalstatehadr_ag_lease_renewal
hadr_ag_lease_renewalerror_codeIf state is HadrLeaseRenewal_FailedWithWindowsError then error_code is the Windows error code associated with the failure
Lease stages and definitions

The following table lists the possible lease stages and explains their functions:

Stage nameDescription
HadrLeaseRenewal_LeaseWorkerStartedLease worker thread started.
HadrLeaseRenewal_StartedExcessLeaseSleepStarting excess lease. Excess lease stages document the starting of the lease thread during the online phase of the availability group.
HadrLeaseRenewal_FailedExcessSleepInvalidOnlineLeaseWe fail the excess lease if the lease is already expired.
HadrLeaseRenewal_SkipExcessSleepWe skip the excess lease if the duration available to sleep is less than the lease interval. There is no need to go through the excess lease just start the hand shake process.
HadrLeaseRenewal_ExcessSleepSucceededExcess lease succeeded.
HadrLeaseRenewal_RenewSucceededWe should see this with every renewal.
HadrLeaseRenewal_LeaseNotValidEquivalent to error: 19419

Windows Server Failover Cluster did not receive a process event signal from SQL Server hosting availability group '%.*ls' within the lease timeout period.
HadrLeaseRenewal_StopLeaseRenewalYou should see this during a failover event.
HadrLeaseRenewal_LeaseExpiredEquivalent to error: 19421

SQL Server hosting availability group '%.*ls' did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.
HadrLeaseRenewal_FailedWithWindowsErrorLease renewal failed because of a windows error.

For more information, see Improved AlwaysOn Availability Group Lease Timeout Diagnostics.

For more information about Lease Timeout, see How It Works: SQL Server AlwaysOn Lease Timeout.
Status
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
Properties

Article ID: 3112363 - Last Review: 07/11/2016 16:56:00 - Revision: 4.0

Microsoft SQL Server 2012 Developer, Microsoft SQL Server 2012 Enterprise, Microsoft SQL Server 2012 Standard, Microsoft SQL Server 2014 Developer, Microsoft SQL Server 2014 Enterprise, Microsoft SQL Server 2014 Express, Microsoft SQL Server 2014 Enterprise Core, Microsoft SQL Server 2014 Standard

  • kbqfe kbfix kbsurveynew kbexpertiseadvanced KB3112363
Feedback