Symptoms
When you use SQL Server Always On Availability Groups, the Always On secondary replica may go into a disconnecting state. Additionally, the following error message is logged in the SQL Server error log:
A connection timeout has occurred while attempting to establish a connection to availability replica availability_replica_name with id availability_replica_id. Either a networking or firewall issue exists, or the endpoint address provided for the replica.
When you try to reestablish the connection, you may receive the following error message:
This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.
When this behavior occurs, the issue is not fixed until you restart SQL Server Services on the secondary replica. In a rare scenario, you may have to restart SQL Server Services on the primary replica to resume Always On data movement. Note This problem might occur only on very powerful computers and when SQL Server is very busy. For example, in one scenario, this problem occurred on a very busy system with 24 cores.
Cause
The problem occurs because of an internal race condition.
Resolution
This issue was fixed in the following cumulative updates of SQL Server.
Latest cumulative update for SQL Server 2016Latest cumulative update for SQL Server 2014Latest cumulative update for SQL Server 2012 SP3
Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server:Status
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
References
Learn about the terminology Microsoft uses to describe software updates.