- Internal cluster communications only
This type is also known as a private network.
- All communications
This type is also known as a mixed network. A mixed network is a combination of a private network and a public network.
For intra-node communications, cluster nodes communicate over User Datagram Protocol (UDP) port 3343. Each node in the cluster periodically exchanges sequenced, unicast UDP datagrams with every other node in the cluster. The purpose of this exchange is to determine whether all nodes are running correctly and to monitor the health of network links.
The cluster network driver (Clusnet.sys) manages cluster communications. The cluster network driver performs the following functions:
- Provides a uniform interface for cluster node communications that are independent of the network infrastructure.
- Monitors the status of all communication paths in the cluster.
- Routes intra-cluster messages over the optimal paths.
- Detects node failures by using periodic messages that are known as "heartbeats".
- Detects failures in network and TCP/IP communications.
Event messages that are similar to the following are logged:
Message 1Message 2Message 3Message 4
The heartbeat processThe exchange of UDP datagrams between nodes in a cluster is known as the "heartbeat process". By default, heartbeats are sent every 1.2 seconds from each network interface for each node to each network interface for every other node that is in the cluster. In Windows Server 2003, multicast datagrams can be used to reduce the amount of heartbeat traffic that occurs between cluster nodes. By default, Windows Server 2003 uses multicast datagrams when three or more nodes are configured in a cluster.
Event IDs 1123 and 1122Event ID 1123 indicates that node A in the cluster did not receive a heartbeat from node B in the cluster for two heartbeat intervals over a specified network interface. That means that node A did not receive a heartbeat from node B for 2.4 seconds.
Event ID 1122 indicates that node A received a heartbeat from node B. This communication update is received after 2.4 seconds but before 4.8 seconds. Event ID 1122 is logged if communications are re-established over a network interface that was previously shut down. For example, event ID 1122 occurs when a node that was shut down rejoins the cluster.
The regroup processAssume that node A does not receive an update from node B after six consecutive heartbeats over all network interfaces that are enabled for internal cluster communications. In this case, node B is assumed to be inactive. The cluster may perform a "regroup" process. During a regroup process, the cluster network driver on node A notifies the Membership Manager and the Node Manager that a failure has occurred. The Membership Manager and the Node Manager initiate a regroup operation that takes node B offline and removes it from active membership in the cluster. When this regroup process occurs, event ID 1126 is logged in the System log. Event ID 1135 may be subsequently logged in the System log. Event ID 1135 indicates that a node has been removed from active cluster membership. Messages that are similar to the following are logged:
Message 1Message 2
Troubleshooting event IDs 1123 and 1122When event ID 1123 is followed by event ID 1122, you can generally ignore the events if the following conditions are true:
- There are no coincident failures of cluster IP address resources, and there are no concurrent resource group failovers.
- The nodes that were removed from cluster membership were removed only because of a loss of network communication. For example, a node was removed when the node was shut down or restarted.
Note You can also generally ignore event IDs 1124, 1126, 1127, and 1130 if they occur during a node restart.
Message 1Message 2This section describes possible reasons why you may receive event ID 1123 followed by event ID 1122. Use this information to evaluate and to troubleshoot these events before you contact Microsoft support.
Network adaptor teamingNetwork adaptor teaming can involve multi-port card or separate single-port PCI network adaptors.
Note Network adaptor teaming is not supported on the cluster heartbeat network adaptor.
The following articles discuss network adaptor teaming with Windows Clustering:
Network adaptor driver issuesNetwork adaptor drivers may be outdated or incorrect. Additionally, some drivers may not match the drivers on other nodes in the cluster.
Network device failuresNetwork devices, such as switch ports or network adaptors, may not be working correctly. However, if all cluster networks log the same error message, a network device is unlikely to be the cause. If only one of the cluster networks logs event IDs 1123 and 1122, you may have one of the following problems:
- Device configuration mismatches
This problem occurs when settings for the network adaptor and for the port that the node is attached to do not match. For example, this problem occurs when a network adaptor is set to Auto Negotiate and when the switch port is set to 100 megabytes (MB) full-duplex. Additionally, some network adaptors take over some of the functionality of the TCP/IP stack. For example, some network adaptors perform flow control and hardware checksumming. As part of the troubleshooting process, you may have to configure the network adaptor to return this functionality to the TCP/IP stack.
For more information, click the following article number to view the article in the Microsoft Knowledge Base:174812 The effects of using Autodetect setting on cluster network interface card
- Switch port issues
This problem is identified by connecting the cluster node to another port. If you connect the node to another port and if event IDs 1123 and 1122 are not repeated, the problem is with the switch port. To identify this problem, you can also plug the cluster nodes into a network hub and then uplink the hub to the switch port. Use this method when the following conditions are true:
- The public network is supported by a switch.
- The private, or heartbeat, network is supported by either a hub or a crossover cable.
- Switch configuration issues
This problem occurs when the spanning tree protocol (STP) has been enabled on the port and when the port is no longer in the forwarding state. Disable this configuration, or enable the rapid spanning tree protocol (RSTP) if the switch supports it. RSTP reduces the time that the switch port must use to transition from a blocking state to a forwarding state.
- Virtual local area network (VLAN) issues
This problem occurs when the cluster nodes are part of a VLAN where the ports reside on different physical switches and when a trunk link configuration is set up between the switches. To resolve this issue, move the node connection to a port that is on the same physical switch.
Node resource issuesA node resource problem occurs because the Server service cannot keep up with incoming or outgoing network connections. The Server service cannot meet the demand for the network items that are queued by the network layer of the I/O stream. In this case, a Server service event, such as event ID 2022, may be logged in the System log. A message that is similar to the following is logged: In this situation, deferred procedure call (DPC) requests are queued ahead of the network requests that are registered by the Interrupt Service Routine (ISR) for the network device. To troubleshoot this issue, investigate all components of the I/O path. This includes the network I/O and hard disk I/O. Use System Monitor to collect this data. For more information about how to troubleshoot this issue, click the following article number to view the article in the Microsoft Knowledge Base:
- SCSI host bus adaptor (SCSI/HBA) network adaptor drivers.
- Multi-path software drivers, such as PowerPath or SecurePath.
- Redundant disk array controllers (RDAC).
- Third-party programs, such as backup software or disk quota software.
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
Incorrect software on Windows-2000 based cluster nodesIn a Windows 2000-based cluster, all cluster nodes must be running Windows 2000 Service Pack 3 or a later version. If your Windows 2000-based computer logged event ID 2022, view the following articles in the Microsoft Knowledge Base to resolve this issue:
Incorrect registry settingsImportant This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
To resolve event messages in Windows 2000-based and Windows Server 2003-based clusters, you may have to make changes to the following registry subkey on each node:
Add the following DWORD values to the registry subkey:
Data Type: REG_DWORD
Value data: 512 (decimal)
Value Name: MaxFreeConnections
Data Type: REG_DWORD
Value data: 4096 (decimal)
Value Name: MinFreeConnections
Data Type: REG_DWORD
Value data: 100 (decimal)
Value Name: MaxWorkItems
Data Type: REG_DWORD
Value data: 6000 (decimal)
- Click Start, click Run, type regedit, and then click
- Locate and then click the following registry subkey:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters
- Right-click parameters, point to
New, and then click DWORD Value.
- Type MaxWorkItems, and then press ENTER.
- Right-click MaxWorkItems, click
Modify, type 6000, click to select the
Decimal option, and then click OK.
High kernel-mode CPU usageTo troubleshoot high kernel-mode CPU usage, use System Monitor to identify the problem. High kernel mode CPU usage may be caused by the following sources:
- Hardware drivers that use DPC and that compete with the DPC routines of the cluster heartbeat process.
- Frequent multiple hardware interrupt requests that occur at the same time.
- Excessive I/O output, such as kernel debug sessions over a serial connection.
High CPU usage that is caused by SNMP agentsThird-party Simple Network Management Protocol (SNMP) agents that run in a cluster may periodically contact the NTFS file system on a shared cluster disk resource. The agents use the CreateFile function to contact NTFS. This behavior can cause significant CPU usage when the SNMP agent caches data on a specific volume.
Multicast issuesMulticast issues may occur in a Windows Server 2003 cluster. To troubleshoot multicast issues, disable multicast support in the cluster. For more information about how to disable multicast, click the following article number to view the article in the Microsoft Knowledge Base:
Article ID: 892422 - Last Review: Mar 29, 2017 - Revision: 3