Symptoms

Assume that you have configured AlwaysOn Availability Group by using Pacemaker for SQL Server 2017 or 2019 on Linux. While connecting to SQL Server, you notice that intermittent Availability Group failover occurs as AG helper connection times out.

Status

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.

Resolution

This issue is fixed in the following cumulative updates for SQL Server:

About cumulative updates for SQL Server:

Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server:

More Information

Assume that you have configured Availability Group (AG) by using Pacemaker for SQL Server 2017 or 2019 on Linux. Consider that the pacemaker AG helper resource agent is using the following cluster configuration file as highlighted. AG helper is using the connection interval of 10 seconds, connection timeout of 30 seconds and monitor timeout of 90 seconds for health check.


<master id="ha_cluster-master">

<primitive class="ocf" id="ha_cluster" provider="mssql" type="ag">

<instance_attributes id="ha_cluster-instance_attributes">

<nvpair id="ha_cluster-instance_attributes-ha_name" name="ha_name" value="TEST_AG"/>

<nvpair id="ha_cluster-instance_attributes-trace_ra" name="trace_ra" value="1"/>

</instance_attributes>

<operations>

<op id="ha_cluster-demote-interval-0s" interval="0s" name="demote" timeout="300"/>

<op id="ha_cluster-monitor-interval-60s" interval="60s" name="monitor" timeout="100"/>

            <op id="ha_cluster-monitor-interval-11" interval="10" name="monitor" role="Master" timeout="90"/>

<op id="ha_cluster-monitor-interval-12" interval="12" name="monitor" role="Slave" timeout="60"/>

<op id="ha_cluster-notify-interval-0s" interval="0s" name="notify" timeout="60"/>

<op id="ha_cluster-promote-interval-0s" interval="0s" name="promote" timeout="60"/>

<op id="ha_cluster-start-interval-0s" interval="0s" name="start" timeout="60"/>

<op id="ha_cluster-stop-interval-0s" interval="0s" name="stop" timeout="300"/>

</operations>

<meta_attributes id="ha_cluster-meta_attributes">

            <nvpair id="ha_cluster-meta_attributes-timeout" name="timeout" value="30s"/>

<nvpair id="ha_cluster-meta_attributes-failure-timeout" name="failure-timeout" value="60s"/>

</meta_attributes>

</primitive>

<meta_attributes id="ha_cluster-master-meta_attributes">

<nvpair id="ha_cluster-master-meta_attributes-notify" name="notify" value="true"/>

<nvpair id="ha_cluster-master-meta_attributes-trace_ra" name="trace_ra" value="1"/>

</meta_attributes>

</master>


Prior to Cumulative Update 21 (CU21) for SQL Server 2017, if AG health check connection times out while connecting to SQL Server, a demote action was initiated leading to failover of AG to secondary node.

From CU21 onwards, if a connection timeout occurs, AG helper resource agent will honor the monitor timeout of 90 seconds, and will attempt two more connections. If all three connection attempts fail, AG helper resource agent will declare the SQL Server as unresponsive and start the demote action leading to failover of the Availability Group to secondary node.

References

Learn about the terminology that Microsoft uses to describe software updates.

Need more help?

Expand your skills
Explore Training
Get new features first
Join Microsoft Insiders

Was this information helpful?

What affected your experience?

Thank you for your feedback!

×