In Windows NT version 4.0 Service Pack 4 (SP4) and Windows 2000, two new switches have been added to Chkdsk.exe. These switches enable users to better manage downtime incurred by running CHKDSK or AUTOCHK.
The switches that are added in Windows NT 4.0 SP4 and Windows 2000 are /C and /I, and are only valid when the target drive has the NTFS format. Each switch directs the CHKDSK routine to bypass certain actions it would otherwise take to validate the integrity of NTFS data structures.
Warning Microsoft does not recommend interrupting the CHKDSK process when it is used with the /f switch, and Microsoft does not guarantee the integrity of the disk if the CHKDSK program is interrupted.
Chkdsk.exe is the command-line interface for a program that verifies the logical integrity of a file system on Windows. When CHKDSK encounters logical inconsistencies it takes actions to repair file system data, provided it is not in read-only mode.
The code that actually performs the verification when CHKDSK is run online resides in utility DLLs such as Untfs.dll and Ufat.dll. The verification routines invoked by Chkdsk.exe are the same ones invoked when a volume is verified through the graphical user interface provided by the Windows Explorer or Disk Administrator. When CHKDSK is scheduled to run at reboot, on the other hand, the binary module that contains the verification code is Autochk.exe. Autochk.exe is a native Windows application that runs early enough in the system boot sequence that it does not have the benefit of Virtual Memory or other Win32 services. Autochk.exe generates the same kind of textual output that the utility DLLs invoked by Chkdsk.exe does. But in addition to displaying this output on the screen during the boot process, Autochk.exe also logs an event to the Application Event Log for the system containing as much of the textual output as can fit into the event log's data buffer.
Because Autochk.exe and the verification code in the utility DLLs used by Chkdsk.exe are based on the same source code, both will be referred to generically as "CHKDSK" throughout the remainder of this article. Likewise, as this article is concerned only with changes in CHKDSK behavior with respect to NTFS volumes, it should be understood that, by saying, "CHKDSK does such-and-such," the following is meant: "CHKDSK does such-and-such when run on an NTFS volume".
Because the use of the /C and /I switches can result in a volume remainingcorrupted even after CHKDSK completes, the use of these switches is notrecommended except in situations where system downtime must be kept to aminimum. These switches are intended to be used by users with exceptionallylarge volumes and who require flexibility in managing the downtime that isincurred when CHKDSK must be run on such volumes.
To understand when it might be appropriate to use these switches, it isimportant to have a basic understanding of some of the internal NTFS datastructures, the kinds of corruption that can take place, what actionsCHKDSK takes when it verifies a volume, and what the potential consequencesare in circumventing CHKDSK's usual verification steps.
CHKDSK's activity is split into three major "stages" during which itexamines all the "metadata" on the volume and an optional fourth stage.Metadata is "data about data." It is the file system overhead, so to speak,that is used to keep track of everything about all of the files on thevolume. Metadata tells what allocation units make up the data for a givenfile, what allocation units are free, what allocation units contain badsectors, and so on. The "contents" of a file, on the other hand, is termed"user data." NTFS protects its metadata through the use of a transactionlog. User data is not so protected.
During its first stage, CHKDSK displays a message on the screen saying thatit is verifying files and counts from 0 to 100 percent complete. Duringthis phase, CHKDSK examines each file record segment (FRS) in the volume'smaster file table (MFT). Every file and directory on an NTFS volume isuniquely identified by a specific FRS in the MFT and the percent completethat CHKDSK displays during this phase is the percent of the MFT that hasbeen verified. During this stage, CHKDSK examines each FRS for internalconsistency and builds two bitmaps, one representing what FRSs are in use,and the other representing what clusters on the volume are in use. At theend of this phase, CHKDSK knows what space is in use and what space isavailable both within the MFT and on the volume as a whole. NTFS keepstrack of this information in bitmaps of its own that are stored on the diskallowing CHKDSK to compare its results with NTFS's stored bitmaps. If thereare discrepancies, they are noted in CHKDSK's output. For example, if anFRS that had been in use is found to be corrupted, the disk clustersformerly associated with that FRS will end up being marked as available inCHKDSK's bitmap, but will be marked as being "in use" according to NTFS'sbitmap.
During its second stage, CHKDSK displays a message on the screen saying thatit is verifying indexes and counts from 0 to 100 percent complete a secondtime. During this phase, CHKDSK examines each of the indexes on the volume.Indexes are essentially NTFS directories and the percent complete thatCHKDSK displays during this phase is the percent of the total number ofdirectories on the volume that have to be checked. During this stage, CHKDSKexamines each directory on the volume for internal consistency and alsoverifies that every file and directory represented by an FRS in the MFT isreferenced by at least one directory. It also confirms that every file orsubdirectory referenced in each directory actually exists as a valid FRS inthe MFT and checks for circular directory references. Finally, it confirmsthat the various time stamps and file size information associated withfiles are all up-to-date in the directory listings for those files. At theend of this phase, CHKDSK has ensured that there are no "orphaned" filesand that all the directory listings are for legitimate files. An orphanedfile is one for which a legitimate FRS exists, but which is not listed inany directory. When an orphaned file is found, it can often be restored toits rightful directory, provided that directory is still around. If thedirectory that should hold the file no longer exists, CHKDSK will create adirectory in the root directory and place the file there. If directorylistings are found that reference FRSs that are no longer in use or thatare in use but do not correspond to the file listed in the directory, thedirectory entry is simply removed.
During its third stage, CHKDSK displays a message on the screen saying thatit is verifying security descriptors and counts from 0 to 100 percentcomplete a third time. During this phase, CHKDSK examines each of thesecurity descriptors associated with each of the files and directories onthe volume. Security descriptors contain information regarding the owner ofthe file or directory, NTFS permission for the file or directory, andauditing information for the file or directory. The percent complete inthis case is the percent of the number of files and directories on thevolume. CHKDSK verifies that each security descriptor structure is wellformed and internally consistent. It does not verify that the listed usersor groups actually exist or that the permissions granted are in any wayappropriate.
The fourth stage of CHKDSK is only invoked if the /R switch is used. /R isused to locate bad sectors in the volume's free space. When /R is used,CHKDSK attempts to read every sector on the volume to confirm that thesector is usable. Sectors associated with metadata are read during thenatural course of running CHKDSK even when /R is not used. Sectorsassociated with user data are read during earlier phases of CHKDSK provided/R is specified. When an unreadable sector is located, NTFS will add thecluster containing that sector to its list of bad clusters and, if thecluster was in use, allocate a new cluster to do the job of the old. If afault tolerant disk driver is being used, data is recovered and written tothe newly allocated cluster. Otherwise, the new cluster is filled with apattern of 0xFF bytes. When NTFS encounters unreadable sectors during thecourse of normal operation, it will also remap them in the same way. Thus,the /R switch is usually not essential, but it can be used as a convenientmechanism for scanning the entire volume if a disk is suspected of havingbad sectors.
The preceding paragraphs give only the broadest outline of what CHKDSK isactually doing to verify the integrity of an NTFS volume. There are manyspecific checks made during each stage and several quick checks betweenstages that have not been mentioned. Instead, this is simply an outline tothe more important facets of CHKDSK activity as a basis for the followingdiscussion regarding the time required to run CHKDSK and the impact of thenew switches provided in SP4.
During the first and third phases of CHKDSK, the percent complete indicatoradvances relatively smoothly. There can be some unevenness in the rate atwhich these phases progress. FRSs that are not in use require less time toprocess than do those that are in use. Larger security descriptors takemore time to process than do smaller ones, and so on. But, overall, thepercent complete displayed is a fairly accurate representation of theactual time required for that phase.
The same is not necessarily true for the second phase of CHKDSK. The amountof time required to process a directory is closely tied to the number offiles or subdirectories listed in that directory. But the percent completelisted during this phase is the percent of the number of directories to beexamined without regard for the fact that some directories might take muchlonger than others to process. For example, on a volume with many smalldirectories and one very large one, the percent complete might progressrapidly from 0 to 10 percent complete and then appear to get stuck for along period of time before rapidly progressing from 10 to 100 percentcomplete. Therefore, unless you know for certain that the directories on avolume are highly uniform with respect to the number of files they contain,the displayed "percent complete" during this phase cannot be considered areliable representation of the actual time remaining for this phase.
To make matters worse for anyone caught in the middle of an unexpectedCHKDSK, the second phase of CHKDSK is the one that typically takes thelongest to run.
By now, it should be clear that many factors having to do with the state ofa volume play a roll in how long CHKDSK will take to run. A formula topredict the time required to run CHKDSK on a given volume would have totake into account such factors as the number of files and directories, thedegree of fragmentation of the volume in general as well as of the masterfile table in particular, whether files have both long names and 8.3formatted names, and how much corruption actually needs to be fixed. Andthat is to say nothing of hardware issues such as amount of system memory,the speed of the CPU, the speed of the disk or disks, and so on.
Rather than attempt to predict how long CHKDSK will take to run for a givenvolume on a given hardware platform, suffice to say that it can takeanywhere from a few seconds to several days -- depending on your specificsituation. Unless /R is used, for a given hardware platform the biggestconcern is the number of files and directories rather than the absolutesize of the volume. That is, a 50 GB volume with one or two large databasefiles will only take seconds for CHKDSK to run provided /R is notspecified. If /R is specified, CHKDSK will have to read verify every sectoron the volume, and that clearly adds significantly for large volumes. Onthe other hand, even a relatively small volume might take hours to runCHKDSK if it has hundreds of thousands or millions of small files --whether or not /R is specified.
The best way to predict how long CHKDSK will take to run on a given volumeis to actually do a trial run in read-only mode during a period of lowsystem usage. Care must be taken using this technique, however, for threereasons:
- Read-only CHKDSK will abort before it completes all three phases if it encounters errors in earlier phases and is prone to falsely reporting errors when in read-only mode. That is, CHKDSK may report that a disk is corrupted even when there is no real corruption present. This can happen if NTFS happens to modify areas of the disk on behalf of some program activity that CHKDSK is examining at the same time. To verify a volume correctly, the volume must be in a static state, and the only way to guarantee that state is to lock the volume. CHKDSK only locks the volume when /F or /R (which implies "F") is specified. Thus, you may need to run CHKDSK more than once to get it to complete all stages in read-only mode.
- System load and whether CHKDSK is running online or during the Windows NT boot sequence can impact the time required to run CHKDSK. CHKDSK is both CPU and disk intensive. Which factor becomes the bottleneck will depend on the specific hardware scenario, but, if heavy disk I/O or high CPU usage is going on concurrent with a read-only CHKDSK, inflated times will result. Also, Autochk.exe runs in a different environment than Chkdsk.exe. While running CHKDSK through Autochk.exe affords exclusive use of CPU and I/O resources to CHKDSK, it also deprives CHKDSK of the benefit of virtual memory. Thus, while Autochk.exe would usually be expected to run faster than Chkdsk.exe, systems with relatively low amounts of RAM may see longer times for Autochk.exe than for Chkdsk.EXE.
- Fixing corruption adds to the time required. A read-only CHKDSK can complete only if no significant corruption is found. If a disk suffers only minor corruption, the time to fix the problems will be only slightly longer than that required for read-only CHKDSK. But if there is major damage, as might result from a serious head-crash or other major hardware failure, the time required to run CHKDSK can increase in proportion to the number of files damaged. In extreme cases, this could more than double the time required for CHKDSK.
Introducing the /C and /I Switches
The /C switch directs CHKDSK to skip the checks that detect cycles in thedirectory structure. Cycles are a very rare form of corruption in which asubdirectory has itself for an ancestor. Using the /C switch can speedCHKDSK by about 1 to 2 percent. Using /C can also leave directory "loops"on an NTFS volume. Such loops may be inaccessible from the rest of thedirectory tree and could result in some number of files being orphaned inthe sense that they cannot be seen by any Win32 applications -- includingbackup applications.
The /I switch directs CHKDSK to skip checks that compare directory entriesto the FRSs that correspond to those entries. Thus, while the directoryentries are still checked to be sure that they are self-consistent, theyare not necessarily consistent with the data stored in their correspondingFRSs even after CHKDSK has run with this switch in effect. Using the /Iswitch typically results in CHKDSK times being reduced by 50 to 70 percent.Exactly how much faster CHKDSK is with this switch will depend on factorssuch as the ratio of files to directories, as well as on the relative speedof disk I/O versus CPU speed, and is, therefore, difficult to predict inadvance. The use of the /I switch can result in directory entries remainingthat refer to incorrect FRSs or in FRSs remaining that are not referencedby any directory entry. The later case is another form of orphaning. Thefile represented by the FRS may be intact in all ways except for the factthat it is invisible to all Win32 applications-including backupapplications. In the former case, files may appear to exist; yetapplications encounter errors when attempting to access them.
When disk corruption is detected on a volume, you have three basic choices
- Do nothing. For a mission critical server that is expected to be online 24 hours a day, this is often the choice of necessity. The drawback to this option is that relatively minor corruption can "snowball" into major corruption if it is not repaired as soon as possible after it is detected. Therefore, this option should only be considered when keeping a system up is more important that the integrity of the data stored on the corrupted volume because all data on the corrupted volume should be considered "at risk" until CHKDSK is run.
- Run a full CHKDSK. This option repairs all file system data, restoring all user data that can be recovered by means of an automated process. The drawback to this option is that a full CHKDSK can require several hours of downtime for a mission critical server at an inopportune time.
- Run an abbreviated CHKDSK using some combination of the /C and /I switches. This option repairs the kinds of corruption that can "snowball" into bigger problems in much less time than a full CHKDSK would require, but does not repair all the corruption that might exist. A full CHKDSK will still be required at some future time to guarantee that all the data that can be recovered has been recovered.
It should be pointed out that NTFS does not guarantee the integrity of userdata following an instance of disk corruption -- even when a full CHKDSK isrun immediately after corruption has been detected. Thus, there may befiles that CHKDSK cannot recover. Also, files that are recovered may beinternally corrupted even after CHKDSK has been run. It, therefore, remainsvitally important that mission critical data be protected by means of aregimen of periodic backups or other robust disaster recovery methodology.