Failover/Failback Policies on Microsoft Cluster Server

This article was previously published under Q197047
This article has been archived. It is offered "as is" and will no longer be updated.
Microsoft Cluster Server (MSCS) has the ability to define a specific node in the cluster as the preferred node that is to own a particular group. If for any reason this node fails or goes offline, you can set the Cluster service to automatically move these groups back to the node that you want to. This is useful in statically load-balancing the nodes in your cluster. This is referred to as Failover and Failback policies.

Setting Failover Policy

To set the Failover policy for a group, use the Failover tab for the group.You can set the failover threshold and the failover period. The failoverthreshold is the number of times the group can fail over within the numberof hours specified by the failover period. For example, if a group failoverthreshold is set to 5, and its failover period set to 3, Cluster Server will failover the group at most five times within a three hour period. The next time a resource in the group exceeds its failure threshold count, Cluster Server will leave the resource in the offline state instead of failing over the group.

NOTE: A group itself does not fail. Only when the following conditions are met for a resource within the group, will the group attempt a failover thus incrementing the Group's failover count.
  • The number of failures for a specific resource has exceeded its threshold count within the defined period.
  • The resource is defined to "Affect the group".
This failover will increment the count for the Group failover threshold by one.

Setting Failback Policy

By default, groups are set not to failback. Unless you manually configureyour group to failback after failover, it continues to run on the alternatenode after the failed node comes back online.

When you configure a group to automatically failback to the preferred node,you specify whether you want the group to failback as soon as the preferrednode is available or to failback only during specific hours that youdefine. This option is useful if you want the failback to occur after peakbusiness hours, or if you want to make sure the preferred node is able tosupport the group when it does come back online.

When setting the "Allow failback", you have two options, Immediate and Failback Between. When setting it to Immediately, ailback will occur as soon as the Cluster Service detects that the preferred owner is now online. When setting the Failback Between, it is hours on a 24 hour clock (i.e. 1 is 1:00am, 15 is 3:00pm, etc). So if you were to set this to Failback between 1 and 2, it means between 1:00am and 2:00am. This is used to prevent interruption of the availability of the cluster resources in that group during peak hours.If you were to set this to between 1 and 1, it means the current 24-hours and is the same setting as Immediate. 

The group must be configured to have a preferred owner to failback. You canspecify a preferred owner on the General tab of the group Properties dialogbox.

NOTE: The "Preferred Owner" of a group must be specified for failback tooccur. The preferred owner is the node that one configures to (under normaloperating circumstances) to host the group. Furthermore, on a resourcelevel, the resource must be configured to have both nodes as "PossibleOwners" for the resource to failover.

Testing Failover Polices

You can test the failover policies you establish for a single group and itsresources by manually failing over those elements.

To test the failover policy for a group, type 0 (zero) for the Threshold in the Properties dialog box for a specific resource. Then, right-click on that resource and click Initiate Failure. Cluster Server immediately fails over the group to the alternate node.

In a test environment, you can fail over all groups from one node toanother by using Cluster Administrator to stop the Cluster Server, pressingthe reset button on the computer, or turning off the power to one of thenodes.

NOTE: Removing the shared SCSI/Fibre cable between either node and the shared disk array is not a valid failover test. The cluster service must have access to the shared array at all times for failover to succeed.
For additional information on resources and groups, please see thefollowing article in the Microsoft Knowledge Base:

169017Information on Groups & Resources Using Cluster Server
mscs server cluster

Article ID: 197047 - Last Review: 12/05/2015 09:55:39 - Revision: 3.0

Microsoft Windows 2000 Advanced Server, Microsoft Windows NT Server 4.0 Enterprise Edition

  • kbnosurvey kbarchive kbinfo KB197047