Symptoms
Assume that you have a Distributed Availability Group (DAG) with many databases (>15) in Microsoft SQL Server 2016 and 2017. Occasionally, connection time-out occurs between the Global Primary and the DAG Forwarder instances of SQL Server. After the connection is reestablished, the data movement for the databases is not automatically resumed. Additionally, you may notice the following:
-
The database replica is in NOT_HEALTY and NOT_SYNCHRONIZING states.
-
The last_commit_time at the DAG Forwarder coincides with the time of the connection time-out.
You may also see the following error messages logged in the Primary Availability Group's primary replica host's error log:
DateTime spid329s A connection time-out has occurred on a previously established connection to availability replica 'ReplicaName' with id [ReplicaID]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
DateTime spid1538s Always On Availability Groups connection with secondary database terminated for primary database 'DatabaseName' on the availability replica 'ReplicaName'with Replica ID: {ReplicaID}. This is an informational message only. No user action is required.
-
This is reported for most if not all databases in the Distributed Availability Group.
Status
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
Resolution
This issue is fixed in the following cumulative updates for SQL Server:
About cumulative updates for SQL Server:
Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server:
References
Learn about the terminology that Microsoft uses to describe software updates.