On a Windows Server 2003 failover cluster, you may
experience one or more of the following symptoms.
Note The "Workaround" section also applies to non-cluster scenarios
where you cannot access a file share.
Symptom 1
You cannot access the existing File Share resources even if these
resources are online according to the Cluster Administrator snap-in.
Additionally, the following error may be logged in the event log:
Date:
date Time:
time Source: ClusSvc Type: Error Event
ID: 1055 User: N/A Computer:
ServerName Description: Cluster File Share
resource 'ShareName' has failed a status check. The
error code is 64.
Symptom 2
One of the following error messages is logged in the Cluster.log
file.
Error message 1
00000930.00000af8::{2003/01/01 23:00:00.001}
File Share ShareName: Share has gone offline,
Error=64! 00000930.00000934::{2003/01/01 23:00:00.001} File Share
ShareName: Share has gone offline,
error=64!
Click on Start, Run, cmd.exe Type "Net HelpMsg
64" "The specified network name is no longer available."
Error message 2
Description: Cluster File Share resource
'ShareName' has failed a status check. The error
code is 32.
Click on Start, Run, cmd.exe Type "Net HelpMsg
32" "The process cannot access the file because it is being used by another
process."
When the Cluster service (ClusSvc.exe) cannot perform an IsAlive test against the File Share resource, event ID 1055 is logged.
Usually, the IsAlive test fails because the Cluster service cannot connect to the
Server service (Srv.sys). This failure causes the Server service to be unable
to keep up with the demand for the network work items that the network layer of
the I/O stream queues.
For more information
about the IsAlive test, click the following article number to view the article
in the Microsoft Knowledge Base:
Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows Server Clustering component of Windows Server 2003
You may experience these symptoms for one or more of
the following reasons:
A slow disk I/O stream causes disk response time to
increase. (This is the most common cause of the symptoms.)
The server is under high load.
An application or a service causes a deadlock in the Server
service.
Device drivers on the server are outdated. For example, the
network adapter driver is outdated.
Applications that use file system filter drivers are
running on the server. Typically, the following applications use file system
filter drivers:
Antivirus software
Backup applications
Quota management applications
A tape device on a server is attached to the same storage
area network (SAN) that the cluster is using. The tape device driver decreases
the disk queue length on the SAN. This action affects each server that is
attached to the SAN. You can increase the disk queue length to avoid this
situation.
Important This section, method, or task contains steps that tell you how to
modify the registry. However, serious problems might occur if you modify the
registry incorrectly. Therefore, make sure that you follow these steps
carefully. For added protection, back up the registry before you modify it.
Then, you can restore the registry if a problem occurs. For more information
about how to back up and restore the registry, click the following article
number to view the article in the Microsoft Knowledge Base:
How to back up and restore the registry in Windows
Note The "Workaround" section also applies to non-cluster scenarios
where you cannot access a file share.
To work around this issue,
follow these steps.
Step 1: Tune the Server service to increase capacity
Note Follow this step if the server has more than 1.5 gigabytes (GB)
of system memory. Otherwise, go to the "Step 2: Tune the Workstation service to
increase capacity" section.
To follow this step, use one of the
following methods.
Method 1
Click Start, click Run,
type regedit.exe, and then click
OK.
In Registry Editor, locate and then right-click the
following registry subkey:
Disk fragmentation causes disks to perform additional data-seeking
operations and slows down the rate of data transfer. To improve disk I/O
performance, follow these steps:
Perform a defragmentation analysis on high-load
volumes.
Disk fragmentation can cause slow disk I/O. If the disk
cannot sustain the network load, the response time of the disk increases. You
can perform a defragmentation analysis on high-load volumes to determine
whether the volumes should be defragmented.
To perform a
defragmentation analysis, follow these steps.
Important Do not perform a defragmentation analysis on a volume that has
active user operations.
Click Start, click
Run, type Dfrg.msc, and then click
OK.
In the Disk Defragmenter snap-in, click the volume that
you want to analyze, and then click Analyze.
If you receive the following message after the analysis is
complete, you should schedule a defragmentation of the volume or rebuild the
volume.
Analysis is complete for:
(VolumeNumber:) You should defragment this
volume.
If you have 200GB or more of data on the volume, it is faster
to rebuild the volume than to defragment the volume. To rebuild the volume,
follow these steps:
Create a new logical unit number (LUN).
Copy the data on the volume to the LUN.
Copy the data back to the volume from the
LUN.
After you rebuild the volume, you should run a third-party
disk defragmenting utility to make sure that the volume is not highly
fragmented.
Run the Chkdsk.exe utility against the volumes.
Note Follow this step only if the defragmentation analysis finishes
quickly and without any error.
You cannot schedule a disk check task
on shared disks when the system starts. Therefore, interactively running the
Chkdsk.exe utility is the only method that is supported on a Windows Server
2003 failover cluster.
Be aware that if severe corruption exists on
the volume, running the Chkdsk.exe utility can take the disk offline and
disconnect all users. When you interactively run the Chkdsk.exe utility, you
can stop the utility if the Chkdsk.exe utility produces severe errors. If the
Chkdsk.exe utility exits, you must rebuild the volume. Disk corruption can
cause an I/O bottleneck when read or write operations occur on the volume.
Remember that if you run the Chkdsk.exe utility against a large volume, it may
take a very long time for the Chkdsk.exe utility to finish. The actual time can
vary from one day to two to three weeks. Therefore, before you run the
Chkdsk.exe utility, you should consider scheduling downtime for the
volume.
To run the Chkdsk.exe utility against an online volume, type
the following command at a command prompt, and then press ENTER:
Chkdsk.exe X:
Note The placeholder X represents the drive
letter. If you have multiple volumes to check, you must run a separate command
for each volume.
Use the "Srv.sys kbqfe" string together with the "Windows
Server 2003" string to search for articles about the Server service driver. For
example, Microsoft Knowledge Base article 950298 appears in the search
results.
Use the "NTOSKrnl.exe kbqfe" string together with the
"Windows Server 2003" string to search for articles about the Windows kernel
driver. For example, Microsoft Knowledge Base article 942835 appears in the
search results.
Use the "MrxSmb.sys kbqfe" string together with the
"Windows Server 2003" string to search for articles about the SMB
mini-redirector driver. For example, Microsoft Knowledge Base article 925903
appears in the search results.
Use the "TCPIP.sys kbqfe" string together with the "Windows
Server 2003" string to search for articles about the TCP/IP driver. For
example, Microsoft Knowledge Base article 950224 appears in the search
results.
Step 6: Obtain the latest device drivers
Outdated device drivers can reduce server performance. Therefore,
we recommend that you follow these steps:
Obtain the latest device driver updates. For example,
obtain the driver update for the HP Solid State Drive (SSD). You can use the
IBM UpdateXpress tool to update device firmware. Frequently, manufacturers
release updated device drivers to resolve bottleneck issues. Contact hardware
vendors to obtain the latest driver updates and to verify that you are using
the current version of the device firmware. It is especially important to do
this for disk subsystem hardware.
If you use SANs, follow these steps:
Check the driver configuration. For example, check the
driver and firmware for SANs and for host bus adapters (HBAs).
Verify that you have the most recent version of the
Storport storage driver (Storport.sys).
Check the following items:
HBA queue depth setting
Multipathing software
Fibre Channel connection
Step 7: Disable file system filter drivers if it is possible
Usually, quota management applications, open file agents, and file
replication applications use file system filter drivers. Therefore, if you
disable these applications, you disable the file system filter drivers. After
you take this action, you can determine whether the problem is
resolved.
You can also configure the antivirus software on the server
to disable real-time scanning of all files. If you cannot disable real-time
scanning of all files, we recommend that you configure the antivirus software
so that it scans only incoming files and does not scan any of the following
files:
Page files
.vhd files
.tmp files
.shd files
.spl files
Step 8: Understand the load on the server, and collect system information in case this problem occurs again
After you perform the previous seven steps, the problem should be
resolved. However, you should make sure that you have an idea of the load on
the server after the server is back in production. Additionally, you should
make sure that you collect system information in case this problem occurs
again.
You can collect general performance data from the server to
obtain an idea of the load on the server.
Note We recommend that you collect the data for at least three hours
per day.
To collect general performance data from the server, follow
these steps.
Obtain an output from the Net Files command and from the Net Sessions command. To do this, one at a time, type the following commands
at a command prompt, and then press ENTER.
Note In the following commands, the placeholder
ServerName represents the name of the server on
which this problem occurs. The placeholder MMDDYEAR
represents the date in the MMDDYear format. For example, you type
01012008 for MMDDYEAR.
From a remote computer, run the Performance Monitor
Wizard utility (PerfWiz.exe), and then click Next.
Next to Monitoring Computer, type the
name of the local computer, and then click Next.
Click Create New Log, and then click
Next.
Click Standard Perfmon, and then click
Next.
Type the name of the server that encounters the
problem, and then click Next.
Next to Log Name, type the name that
you want to use for the log, and then click Next. You can use
the default size of 200 MB that appears next to Log file size
for the log size. If you specify a different size, make sure that you do not
specify a size that is larger than 250 MB. If the log size is larger than 250
MB, it is almost impossible for the system to read the log because of the delay
that occurs when the system is loading the performance counters.
Take one of the following actions:
Under Average Time to issue, type
6 hours.
Under Sample Interval, type
300 seconds.
Click Next.
Click START, and then click
Next.
Click Finish.
Run the PerfWiz.exe utility to create another
Performance Monitor log. To do this, use the standard log profile, and use the
following settings:
Set Log file size to 150
MB.
Set Average Time to issue to 1
hour.
Windows Server 2003, x64-based and Itanium-based versions
From a remote computer, click Start,
click Run, type PerfMon, and then click
OK.
In the Performance snap-in, expand Performance
Logs and Alerts.
Right-click Counter Logs, and then
click New Log Settings.
In the New Log Settings dialog box,
type a name for the new log, and then click OK.
In the
LogName dialog box, click
Add Counters.
Note The placeholder LogName represents the
name of the log that you created in the previous step.
In the Add Counters dialog box, click
Select counters from computer, and then specify the name of
the server that encounters the problem. Specify the server name in the
following form:
\\ServerName
Under Performance object, select
Process.
Click All counters, click All
instances, and then click Add.
Repeat the previous two steps to add the following
objects:
Cache
Memory
Objects
Paging File
LogicalDisk
NBT Connection
Network Interface
PhysicalDisk
Processor
Redirector
Server
Server Work Queues
System
Thread
Note For some objects, the All instances option does
not apply.
In the Add Counters dialog box, click
Close.
In the
LogName dialog box, type
600 next to Interval, and then select
seconds next to Units.
Click the Log Files tab, select
Binary Circular File under Log file type, and
then click Configure.
In the Configure Log Files dialog box,
type 250 under Limit of, and then click
OK
Note If you set the size limit to 250 MB, you can see enough history
to recognize a trend. Notice that the log can become very large. However, the
log will be automatically compressed to about 20 percent of its original size.
Therefore, if the log reaches the maximum size, the log will be automatically
compressed to around 50 MB.
In the
LogName dialog box, click
OK.
Repeat steps c through o of this subsection to create
another log. However, type 5 next to
Interval in step k.
Obtain the description of the files that are generally
opened on the server and the description of the roles of the server. For
example, a server may act as a file server or as a terminal server.
You can use the Network Monitor utility to capture network
traffic in case this problem occurs again.
Note Capturing network traffic is also known as capturing the network
sniffer trace.
To download the Network Monitor utility, visit the
following Microsoft Download Web site:
To use the Network Monitor utility to capture
network traffic, follow these steps.
Note The following steps are based on Network Monitor 3.2.1303.0.
Set up the Network Monitor utility. To do this, follow
these steps:
Start the Network Monitor utility.
Under Select Networks, click to select
the Local Area Connection check box.
Click New Capture.
On the Tools menu, click
Options.
In the Options dialog box, click the
Capture tab.
Under Temporary capture file, change
the Size setting from 20 to
30.
Click OK.
When you are ready to reproduce the problem, ping the
client from the server. To do this, type the following command at a command
prompt, and then press ENTER:
ping <ClientName>
In the Network Monitor utility, click
Start to start the capture.
Note In the HH:MM:SS format, note the time when you start the
trace.
Start the applications that are typically running on the
server.
When this problem occurs, note the time in the HH:MM:SS
format.
Stop the applications.
In the Network Monitor utility, click
Stop.
Save the trace. To do this, follow these steps:
On the File menu, click Save
As.
In the File name box, specify a name
in the following format:
Note The placeholder ProblemDescription
represents a brief description of the problem. The placeholder
ServerName represents the name of the server that
encounters the problem. The placeholder UserName
represents the account that you use to log on the server. The placeholder
HHMM represents the time when you stop the trace.
The placeholder MMDDYear represents the date when
you stop the trace.
Click Save.
Important You should also collect the IP address of the server and the IP
address of the client.
How to optimize the network load that the clients create
After you analyze both the kind of load on the server and the
network trace that you capture, you can optimize the network load that the
clients create. To do this, fine-tune the Registry to reduce the network
connections to the File Share resources on the server.
After you
analyze both the kind of load on the server and the network trace that you
capture, you can optimize the network load that the clients create. To do this,
fine-tune the Registry on the client computers to reduce the network
connections to the File Share resources on the server. The client computers you
must do this on include the following:
Windows XP
Windows Vista
Windows 7
Windows Server 2003 Terminal Server
Windows Server 2008 Terminal Server
To fine-tune the registry, use one of the following methods.
Important This section, method, or task contains steps that tell you how to
modify the registry. However, serious problems might occur if you modify the
registry incorrectly. Therefore, make sure that you follow these steps
carefully. For added protection, back up the registry before you modify it.
Then, you can restore the registry if a problem occurs. For more information
about how to back up and restore the registry, click the following article
number to view the article in the Microsoft Knowledge Base:
How to back up and restore the registry in Windows
Method 1
Start Registry Editor.
In Windows Vista, in Windows 7, and in Windows Server
2008, click Start
Collapse this imageExpand this image
, type
regedit in the Start Search box, and
then press ENTER.
Collapse this imageExpand this image
If you are prompted for an administrator password or for
confirmation, type the password or provide confirmation.
In Windows XP and in Windows Server 2003, click
Start, click Run, type
regedit in the Open box, and then click
OK.
In Registry Editor, locate and then click the following
registry subkey:
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\policies\explorer
Create a NoRemoteRecursiveEvents registry entry, and then
set the value to 1 (hexadecimal). To do this, follow these steps:
On the Edit menu, point to
New, and then click DWORD Value.
Type NoRemoteRecursiveEvents,
and then press ENTER.
On the Edit menu, click
Modify .
In the Edit DWORD Value dialog box,
click Hexadecimal under Base , type
1 in the Value data box, and then click
OK.
Create a NoRemoteChangeNotify registry entry, and then set
the value to 1 (hexadecimal). To do this, follow these steps:
On the Edit menu, point to
New, and then click DWORD Value.
Type NoRemoteChangeNotify, and
then press ENTER.
On the Edit menu, click
Modify.
In the Edit DWORD Value dialog box,
click Hexadecimal under Base, type
1 in the Value data box, and then click
OK.
In
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MRxSmb\Parameters Create a
InfoCacheLevel registry entry, and then set the value to A (hexadecimal). To do
this, follow these steps:
On the Edit menu, point to
New, and then click DWORD Value.
Type InfoCacheLevel, and then
press ENTER.
On the Edit menu, click
Modify.
In the Edit DWORD Value dialog box,
click Hexadecimal under Base, type
10 in the Value data box, and then
click OK.
Exit Registry Editor.
Method 2
Start Notepad.
In Windows Vista, in Windows 7, and in Windows Server
2008, click Start
Collapse this imageExpand this image
, type
notepad.exe in the Start Search box,
and then press ENTER.
In Windows XP and in Windows Server 2003, click
Start, click Run, type
notepad.exe in the Open box, and then
click OK.
Copy the following text, and then paste it into Notepad:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\policies\explorer]
"NoRemoteRecursiveEvents"=dword:00000001
"NoRemoteChangeNotify"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MRxSmb\Parameters]
"InfoCacheLevel"=dword:000000010
Save the file as .reg file type, and then exit
Notepad.
In Windows Explorer, double-click the file to import these
registry settings.
For more information about how to fine-tune the
Registry, click the following article numbers to view the articles in the
Microsoft Knowledge Base: