Article ID: 903650 - View products that this article applies to.
This article describes an update to the maintenance mode feature for cluster physical disk resources in Microsoft Windows Server 2003. This update provides extended functionality when you use maintenance mode to perform administrative and maintenance tasks in a clustered environment. For example, this update lets you perform a Volume Shadow Copy Service (VSS) recovery on a disk resource without taking down all dependent resources and their services.
Overview of Windows Server 2003 cluster managementServer clusters introduce an environment where disks are managed differently than they are in a stand-alone server environment. For example, in server clusters, multiple initiators do not access a single disk. Clustering requires that only one server or only one node access a Logical Unit Number (LUN) at a time. This configuration guarantees that another server does not try to write to the same disk. If more than one server writes to the same disk, data on the disk may become corrupted.
Health monitoring checks on a cluster-managed LUNA series of health monitoring checks are performed on cluster-managed LUNs to make sure that a LUN is available. If any one of these checks fail, the Cluster service assumes that there is a problem with the LUN and takes recovery action. Recovery actions may include the following:
Maintenance mode in Windows Server 2003 Service Pack 1Windows Server 2003 Service Pack 1 includes a new feature that is called maintenance mode. This mode lets you perform certain administrative and maintenance tasks in a clustered environment. These tasks must be performed on shared clustered disks. These shared clustered disks may require another application or mechanism to obtain an exclusive lock on a disk. When you perform administrative and maintenance tasks, a disk may appear in a regular online state, but health monitoring may be temporarily suppressed. Also, the disk may not be available to clients.
For example, if you type chkdsk /f on a disk that is managed and monitored by the cluster, the disk is locked for exclusive use to examine the consistency of the disk. This behavior may cause health monitoring checks for that disk to fail if the CHKDSK command takes longer then the time-out period that is set for the disk. For example, the IsAlive or LooksAlive checks may fail. Therefore, the group in which that disk resides may fail over to another node in the cluster. A failover interrupts the CHKDSK command and affects the availability of all other resources during the failover.
Extended maintenance modeWindows Server 2003 SP1 added another feature to the Cluster.exe command-line tool. The tool now waits for the internal state of a resource to stabilize and to complete the online or offline process. This process can be scripted. For example, after you put a resource in extended maintenance mode, the script calls the /waitmaint parameter. The /waitmaint parameter blocks activity until the resource has gone into a full stable state internally. Use the following command syntax to call the /waitmaint parameter:
cluster resource_name /waitmaint[pending]This update enables the maintenance mode process to perform the following extended functions:
You can use extended maintenance mode together with a hardware snapshot application to perform any functions that are required to complete a snapshot restore. For example, you can mask off a LUN, swap LUNs, put a LUN in read-only mode, and so on.
Important A backup requestor application must provide the timing coordination during a snapshot restore to make sure that the correct application VSS writers are called. Application VSS writers make sure that all handles are closed and that there is no disk usage. Timing coordination prevents an adverse effect on higher level applications when the disk dismount occurs. After the snapshot restore is completed, the disk must be brought out of extended maintenance mode to make sure that the disk is mounted and ready to be accessed by applications.
When extended maintenance mode hands over complete control of the LUN to another process, such as to the operating system, the disk is dismounted. In this case, the disk is inaccessible to applications. Use extended maintenance mode only together with a backup requestor application that stops all usage of a disk by applications before the backup requestor application invokes extended maintenance mode. If disk usage is not stopped, application failures and potential corruption of data may occur. Extended maintenance mode should be started only by backup requestor applications that are configured by independent software vendors (ISVs). These ISVs must be familiar with application VSS writers and similar mechanisms that help prevent timing problems.
Warning We do not recommend that you use maintenance mode to bypass health monitoring checks for a disk that is experiencing problems. When a disk is in maintenance mode, any failures or time-outs of the disk are ignored by the Cluster service. This behavior may prevent failover and recoverability if a disk is intentionally or accidentally left permanently in maintenance mode.
How to put a disk in maintenance modeTo start maintenance mode, use one of the following methods:
If there is any change to the state of the disk resource in maintenance mode, the maintenance mode setting is disabled. The maintenance mode setting is disabled when the following conditions are true:
The Cluster.exe command-line tool is used to query, to set, and to clear maintenance mode for a resource.
Note You cannot view, change, or set maintenance mode for a resource from the Cluster Administrator graphical user interface tool.
Maintenance mode command-line syntaxTo put a disk resource in maintenance mode, use the following Cluster.exe command syntax:
cluster.exe . res "%disk_name%" /maint:onNote For more information about Cluster.exe command syntax, type cluster /? at a command prompt.
The following is sample output that shows that a resource has been put in maintenance mode:
Use the following Cluster.exe command syntax to query a disk resource to determine whether the resource is in maintenance mode:
G:\>cluster TestCluster res "Disk H:" /maint:on Setting maintenance mode for resource 'Disk H:' Resource Group Node Status -------------------- -------------------- --------------- ------ Disk H: Exchange Node1 Online(Maintenance)
Cluster.exe . res "%Disk Name%" /maintThe following sample output shows a resource that is being queried for its maintenance mode setting:
Use the following Cluster.exe command syntax to bring a disk resource out of maintenance mode:
G:\>cluster TestCluster res "Disk H:" /maint Resource Group Node Status -------------------- -------------------- --------------- ------ Disk H: Exchange Node1 Online(Maintenance)
cluster.exe . res "%disk_name%" /maint:offThe following sample output shows a resource that is out of maintenance mode:
G:\>cluster TestCluster res "Disk H:" /maint:off Clearing maintenance mode for resource 'Disk H:' Resource Group Node Status -------------------- -------------------- --------------- ------ Disk H: Exchange Node1 Online
How to put a disk into extended maintenance mode
How to bring a disk out of extended maintenance mode
Extended maintenance mode limitationsA backup and restore application must be able to "hot swap" a cluster disk without taking the disk offline. Hot swapping occurs when you remove a device from a computer that is running and then replace that device with an identical device in the same slot. When a cluster disk is online, the Clusdisk.sys driver is attached to the disk and helps protect the disk. When a disk is removed from the system, all state Clusdisk.sys and disk resource maintain operations for an online disk are invalidated. The hot swap disk has to be brought online by the Clusdisk.sys driver and by disk resource operations before the hot swap disk can be used by the applications. You must perform all the operations that are described in this article without taking the corresponding cluster resource for the disk offline.
To support hot swapping, cluster disk resource operations must be able to bring the disk offline and online internally. For example, the LUN must be dismounted, and all monitoring must stop. However, the corresponding cluster resource must remain online. When hot swapping is completed, the LUN is remounted, and monitoring resumes.
When you put a disk in extended maintenance mode, the state transitions are symmetric. When a disk is online, you can switch to maintenance mode. From maintenance mode, you can switch to extended maintenance mode. However, when a disk is online, you cannot switch directly to extended maintenance mode.
A regression has been found in this update that prevents you from creating a quorum by using Majority Node Set (MNS). If you convert from shared quorum resource to MNS, an error 1 (invalid function) occurs. If the Cluster is already using MNS for a quorum and you apply this update, the MNS resource cannot come online. Additionally, Cluster administrator displays the following error message:
To resolve the issue with MNS, please download the following update:
An error has occurred attempting to make <MNS_Resource> the quorum resource. Incorrect function
Error ID: 1 (00000001). Snippit from the cluster log:
Majority Node Set <MNS>: Expanded path '\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$' [FM]
FmSetQuorumResource: Entry, pszClusFileRootPath=\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$\MSCS 000000ac.00001038::2006/10/01-03:38:13.370 ERR [FM] FmSetQuorumResource: Unable to get maintenance mode info for resource 'MNS', status 1 [FM]
FmSetQuorumResource: Exit, status=1 [FM]
FmSetQuorumResource: Entry, pszClusFileRootPath=\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$\MSCS 000000ac.00001758::2006/10/01-03:38:59.730 ERR [FM] FmSetQuorumResource: Unable to get maintenance mode info for resource 'MNS', status 1 [FM] FmSetQuorumResource: Exit, status=1
(http://support.microsoft.com/kb/921181/ )An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server clusters
Service pack informationTo resolve this problem, obtain the latest service pack for Windows Server 2003. For more information, click the following article number to view the article in the Microsoft Knowledge Base:
(http://support.microsoft.com/kb/889100/ )How to obtain the latest service pack for Windows Server 2003
Update informationA supported hotfix is available from Microsoft. However, this hotfix is intended to correct only the problem that is described in this article. Apply this hotfix only to systems that are experiencing this specific problem. This hotfix might receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next software update that contains this hotfix.
If the hotfix is available for download, there is a "Hotfix download available" section at the top of this Knowledge Base article. If this section does not appear, contact Microsoft Customer Service and Support to obtain the hotfix.
Note If additional issues occur or if any troubleshooting is required, you might have to create a separate service request. The usual support costs will apply to additional support questions and issues that do not qualify for this specific hotfix. For a complete list of Microsoft Customer Service and Support telephone numbers or to create a separate service request, visit the following Microsoft Web site:
http://support.microsoft.com/contactus/?ws=supportNote The "Hotfix download available" form displays the languages for which the hotfix is available. If you do not see your language, it is because a hotfix is not available for that language.
Restart requirementYou must restart the computer after you apply this update.
Update replacement informationThis update does not replace any other updates.
File informationThe English version of this update has the file attributes (or later file attributes) that are listed in the following table. The dates and times for these files are listed in Coordinated Universal Time (UTC). When you view the file information, it is converted to local time. To find the difference between UTC and local time, use the Time Zone tab in the Date and Time item in Control Panel.
Windows Server 2003, x86-based versions
Date Time Version Size File name -------------------------------------------------------------- 18-Aug-2005 04:54 5.2.3790.2511 476,672 Clusres.dll 18-Aug-2005 02:39 5.2.3790.2511 838,144 Clussvc.exe 18-Aug-2005 02:40 5.2.3790.2511 181,248 Cluster.exe 18-Aug-2005 02:40 5.2.3790.2511 68,096 Resrcmon.exe 18-Aug-2005 02:33 5.2.3790.2511 7,168 W03a2409.dll 18-Aug-2005 02:48 5.2.3790.2511 32,256 Arpidfix.exe
Windows Server 2003, x64-based versions
Date Time Version Size File name Platform -------------------------------------------------------------------- 18-Aug-2005 14:19 5.2.3790.2511 651,264 Clusres.dll x64 18-Aug-2005 14:19 5.2.3790.2511 1,231,360 Clussvc.exe x64 18-Aug-2005 14:19 5.2.3790.2511 338,432 Cluster.exe x64 18-Aug-2005 14:19 5.2.3790.2511 97,280 Resrcmon.exe x64 18-Aug-2005 14:19 5.2.3790.2511 7,680 W03a2409.dll x64 18-Aug-2005 14:19 5.2.3790.2511 181,248 Wcluster.exe x86 18-Aug-2005 14:19 5.2.3790.2511 68,096 Wresrcmon.exe x86 18-Aug-2005 14:19 5.2.3790.2511 7,168 Ww03a2409.dll x86 18-Aug-2005 14:19 5.2.3790.2511 43,008 Arpidfix.exe x64
Windows Server 2003, Itanium-based versions
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
Date Time Version Size File name Platform -------------------------------------------------------------------- 18-Aug-2005 14:19 5.2.3790.2511 1,162,240 Clusres.dll IA-64 18-Aug-2005 14:19 5.2.3790.2511 2,068,992 Clussvc.exe IA-64 18-Aug-2005 14:19 5.2.3790.2511 543,744 Cluster.exe IA-64 18-Aug-2005 14:19 5.2.3790.2511 184,320 Resrcmon.exe IA-64 18-Aug-2005 14:19 5.2.3790.2511 6,144 W03a2409.dll IA-64 18-Aug-2005 14:19 5.2.3790.2511 181,248 Wcluster.exe x86 18-Aug-2005 14:19 5.2.3790.2511 68,096 Wresrcmon.exe x86 18-Aug-2005 14:19 5.2.3790.2511 7,168 Ww03a2409.dll x86 18-Aug-2005 14:19 5.2.3790.2511 74,752 Arpidfix.exe IA-64
(http://support.microsoft.com/kb/824684/ )Description of the standard terminology that is used to describe Microsoft software updates
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section. This problem was first corrected in Windows Server 2003 Service Pack 2.
Article ID: 903650 - Last Review: July 14, 2009 - Revision: 5.0