A Windows Server 2008 R2 failover cluster loses quorum when an asymmetric communication failure occurs

Article translations Article translations
Article ID: 2552040 - View products that this article applies to.
Expand all | Collapse all

On This Page

Symptoms

Consider the following scenario:
  • You create a Windows Server 2008 R2 failover cluster that has three or more nodes. 
  • An asymmetric communication failure occurs in the cluster. For example, two nodes cannot communicate with one another. However, the two nodes may be able to communicate with other nodes in the cluster.
In this scenario, the cluster may lose quorum. This causes the cluster to stop running, and all resources that are hosted by the cluster become unavailable for some time.

Note The expected behavior is that as few nodes as possible are removed from the cluster so that the remaining nodes can all communicate with one another. After nodes are removed, the failover cluster continues to run if it still has quorum.

The following is a sample cluster log when a communication failure occurs between node 2 and node 3:


[NODE] Node 2: channel (write) to node 3 is broken. Reason GracefulClose(1226)' because of 'channel to remote endpoint <IP address>:~3343~ is closed'
[NODE] Node 2: Connection to Node 3 is broken. Reason Closed(1236)' because of 'channel to remote endpoint <IP address>:~3343~ has failed with status ERROR_SUCCESS(0)'
[NODE] Node 2: Connection to Node 3 is broken. Reason Closed(1236)' because of 'channel to remote endpoint <IP address>:~3343~ has failed with status ERROR_SUCCESS(0)'
[RGP] Node 2: I was trimmed out during regroup by other nodes (2), self isolate
[QUORUM] Node 2: Lost quorum (2)
lost quorum (status = 5925)
lost quorum (status = 5925), executing OnStop
FatalError is Calling Exit Process.
[DM] Delete hive C:\Windows\Cluster\CLUSDB.bak failed: ERROR_FILE_NOT_FOUND(2)
[DM] Delete logger files for hive at C:\Windows\Cluster\CLUSDB.bak failed: ERROR_FILE_NOT_FOUND(2)

Cause

The issue occurs because the Cluster service does not handle regroup messages correctly. When an asymmetric communication failure occurs, each node sends multicast regroup messages to all the other nodes in the cluster. If a node receives the regroup messages in an unexpected order, the cluster may lose quorum.

Resolution

To resolve the issue, apply the following hotfix on all the nodes in the failover cluster. 

Hotfix information

A supported hotfix is available from Microsoft. However, this hotfix is intended to correct only the problem that is described in this article. Apply this hotfix only to systems that are experiencing the problem that is described in this article. This hotfix might receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next software update that contains this hotfix.

If the hotfix is available for download, there is a "Hotfix download available" section at the top of this Knowledge Base article. If this section does not exist, contact Microsoft Customer Service and Support to obtain the hotfix.

Note If additional issues occur or if any troubleshooting is required, you might have to create a separate service request. The usual support costs will apply to additional support questions and issues that do not qualify for this specific hotfix. For a complete list of Microsoft Customer Service and Support telephone numbers or to create a separate service request, visit the following Microsoft website:
http://support.microsoft.com/contactus/?ws=support
Note The "Hotfix download available" form displays the languages for which the hotfix is available. If you do not see your language, it is because a hotfix is not available for that language.

Prerequisites

To apply this hotfix, you must be running Windows Server 2008 R2 or Windows Server 2008 R2 Service Pack 1 (SP1). For more information about how to obtain a Windows 7 service pack or Windows Server 2008 R2 service pack, click the following article number to view the article in the Microsoft Knowledge Base:
976932 Information about Service Pack 1 for Windows 7 and for Windows Server 2008 R2

Registry information

To use the hotfix in this package, you do not have to make any changes to the registry.

Restart requirement

You do not have to restart the computer after you apply this hotfix. To avoid restarting the computer, stop the Cluster service before you apply this hotfix.

Hotfix replacement information

This hotfix does not replace a previously released hotfix.

File information

The English (United States) version of this hotfix installs files that have the attributes that are listed in the following tables. The dates and the times for these files are listed in Coordinated Universal Time (UTC). The dates and the times for these files on your local computer are displayed in your local time together with your current daylight saving time (DST) bias. Additionally, the dates and the times may change when you perform certain operations on the files.
Windows Server 2008 R2 file information notes
Important Windows 7 hotfixes and Windows Server 2008 R2 hotfixes are included in the same packages. However, hotfixes on the Hotfix Request page are listed under both operating systems. To request the hotfix package that applies to one or both operating systems, select the hotfix that is listed under "Windows 7/Windows Server 2008 R2" on the page. Always refer to the "Applies To" section in articles to determine the actual operating system to which each hotfix applies.
  • The files that apply to a specific product, SR_Level (RTM, SPn), and service branch (LDR, GDR) can be identified by examining the file version numbers as shown in the following table.
    Collapse this tableExpand this table
    VersionProductSR_LevelService branch
    6.1.760 0 . 20xxxWindows Server 2008 R2RTMLDR
    6.1.760 1 . 21xxxWindows Server 2008 R2SP1LDR
  • The MANIFEST files (.manifest) and the MUM files (.mum) that are installed for each environment are listed separately in the "Additional file information for Windows Server 2008 R2" section. MUM and MANIFEST files, and the associated security catalog (.cat) files, are very important to maintaining the state of the updated components. The security catalog files, for which the attributes are not listed, are signed with a Microsoft digital signature.
For all supported x64-based versions of Windows Server 2008 R2
Collapse this tableExpand this table
File nameFile versionFile sizeDateTimePlatform
Clussvc.exe6.1.7600.209724,601,34424-May-201111:27x64
Clussvc.exe6.1.7601.217334,602,36824-May-201111:16x64
Cluswmi.dll6.1.7600.20972540,16024-May-201111:29x64
Cluswmi.mofNot applicable76,75224-May-201105:01Not applicable
Cluswmiuninstall.mofNot applicable17624-May-201105:01Not applicable
Cluswmi.dll6.1.7601.21733542,20824-May-201111:18x64
For all supported IA-64-based versions of Windows Server 2008 R2
Collapse this tableExpand this table
File nameFile versionFile sizeDateTimePlatform
Clussvc.exe6.1.7600.209727,740,41624-May-201110:06IA-64
Clussvc.exe6.1.7601.217337,741,95224-May-201110:07IA-64
Cluswmi.dll6.1.7600.20972884,22424-May-201110:08IA-64
Cluswmi.mofNot applicable76,75224-May-201104:49Not applicable
Cluswmiuninstall.mofNot applicable17624-May-201104:49Not applicable
Cluswmi.dll6.1.7601.21733886,78424-May-201110:09IA-64

Status

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.

More information

For more information about software update terminology, click the following article number to view the article in the Microsoft Knowledge Base:
824684 Description of the standard terminology that is used to describe Microsoft software updates

Additional file information

Additional file information for Windows Server 2008 R2

Additional files for all supported x64-based versions of Windows Server 2008 R2
Collapse this tableExpand this table
File nameAmd64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7600.20972_none_1648104bd454fdde.manifest
File versionNot applicable
File size7,438
Date (UTC)24-May-2011
Time (UTC)12:10
PlatformNot applicable
File nameAmd64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7601.21733_none_185aad4bd15a135d.manifest
File versionNot applicable
File size7,438
Date (UTC)24-May-2011
Time (UTC)13:29
PlatformNot applicable
File nameAmd64_microsoft-windows-f..overcluster-cluswmi_31bf3856ad364e35_6.1.7600.20972_none_18f6098fd29a97bb.manifest
File versionNot applicable
File size6,668
Date (UTC)24-May-2011
Time (UTC)12:08
PlatformNot applicable
File nameAmd64_microsoft-windows-f..overcluster-cluswmi_31bf3856ad364e35_6.1.7601.21733_none_1b08a68fcf9fad3a.manifest
File versionNot applicable
File size6,668
Date (UTC)24-May-2011
Time (UTC)13:27
PlatformNot applicable
File nameUpdate.mum
File versionNot applicable
File size33,328
Date (UTC)24-May-2011
Time (UTC)23:03
PlatformNot applicable
File nameWow64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7600.20972_none_209cba9e08b5bfd9.manifest
File versionNot applicable
File size4,604
Date (UTC)24-May-2011
Time (UTC)10:51
PlatformNot applicable
File nameWow64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7601.21733_none_22af579e05bad558.manifest
File versionNot applicable
File size4,604
Date (UTC)24-May-2011
Time (UTC)12:46
PlatformNot applicable
Additional files for all supported IA-64-based versions of Windows Server 2008 R2
Collapse this tableExpand this table
File nameIa64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7600.20972_none_ba2b18be1bf595a4.manifest
File versionNot applicable
File size7,436
Date (UTC)24-May-2011
Time (UTC)12:03
PlatformNot applicable
File nameIa64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7601.21733_none_bc3db5be18faab23.manifest
File versionNot applicable
File size7,436
Date (UTC)24-May-2011
Time (UTC)13:22
PlatformNot applicable
File nameIa64_microsoft-windows-f..overcluster-cluswmi_31bf3856ad364e35_6.1.7600.20972_none_bcd912021a3b2f81.manifest
File versionNot applicable
File size6,666
Date (UTC)24-May-2011
Time (UTC)12:01
PlatformNot applicable
File nameIa64_microsoft-windows-f..overcluster-cluswmi_31bf3856ad364e35_6.1.7601.21733_none_beebaf0217404500.manifest
File versionNot applicable
File size6,666
Date (UTC)24-May-2011
Time (UTC)13:20
PlatformNot applicable
File nameUpdate.mum
File versionNot applicable
File size7,944
Date (UTC)24-May-2011
Time (UTC)23:03
PlatformNot applicable
File nameWow64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7600.20972_none_209cba9e08b5bfd9.manifest
File versionNot applicable
File size4,604
Date (UTC)24-May-2011
Time (UTC)10:51
PlatformNot applicable
File nameWow64_microsoft-windows-f..overcluster-clussvc_31bf3856ad364e35_6.1.7601.21733_none_22af579e05bad558.manifest
File versionNot applicable
File size4,604
Date (UTC)24-May-2011
Time (UTC)12:46
PlatformNot applicable

Properties

Article ID: 2552040 - Last Review: August 17, 2012 - Revision: 2.0
Applies to
  • Windows Server 2012 Standard
  • Windows Server 2012 Essentials
  • Windows Server 2008 R2 Datacenter
  • Windows Server 2008 R2 Enterprise
  • Windows Server 2008 R2 for Itanium-Based Systems
Keywords: 
kbautohotfix kbqfe kbhotfixserver kbfix kbsurveynew kbexpertiseadvanced KB2552040

Give Feedback

 

Contact us for more help

Contact us for more help
Connect with Answer Desk for expert help.
Get more support from smallbusiness.support.microsoft.com