When examining the SQL Server errorlog, you may periodically see thefollowing types of messages:
bufwait: timeout, BUF_IO, bp 0xd37ab0, pg 0x5e74e, stat 0x1008/0x400003, obj 0x6d4f8c1b, bpss 0x2069e1f0
writelog: timeout, dbid 2, dbstat2 0x22(0x22), q1/q2 0/1, owner=3, waittype=0x81
This article explains the meaning of bufwait and writelog timeout messages.
SQL Server uses the asynchronous I/O calls ReadFile(), ReadFileEx(),WriteFile(), and WriteFileEx() when performing reads and writes. TheseWin32 API calls submit an I/O request to Windows NT Server, and then allowthe application to continue with other operations while waiting for theoperation to complete. At completion, it signals the application that thewrite has completed by means of a synchronization event or completionroutine.
Both the bufwait and logwrite messages are actually warnings, and may notnecessarily be a sign of any problem with the computer or SQL Server. IfSQL Server has waited longer than the number of seconds configured in the'resource timeout' setting (which by default is 10 seconds), it prints anerror message indicating the timeout. However, it must continue waiting onthe I/O request to complete before it can continue. The bufwait messagecorresponds to a delay in either reading a page from disk to put in cache,or flushing a page in cache out to disk. The logwrite message is specificto reading or writing log records.
SQL Server uses normal thread synchronization methods to guarantee orderedaccess to the transaction log. In a writelog timeout message, the databaseID is supplied in the dbid field, as well as the process ID (owner) of theother connection on which it is waiting. In the sample message above,notice that the current process was using tempdb (dbid 2), and that it iswaiting on the checkpoint process (owner=3). When investigating the causeof writelog timeouts, it may be beneficial to periodically querysysprocesses to gather information about what types of processing are beingperformed by the "owner" spid listed in the writelog message.
Heavy rollback activity adds additional log and disk I/O contention, andsuch activity has been observed as contributing to writelog timeouts. Forbest performance, attempt to minimize the number of transactions that mustbe rolled back; as activity on the server increases, using a query timeoutthat is too short may further exacerbate the problem by causing unnecessaryrollbacks. Beginning with SQL Server 6.5 Service Pack 2, a newconfiguration option, LogLRU buffers, was added that may improveperformance in environments requiring heavy log reads, such as would berequired in processing a rollback or trigger. Note that using this optionreserves space to hold log pages in cache, reducing the amount of datacache available. See the Readme.txt file included in the service pack formore information on determining an optimum value for this parameter.
If these messages appear only intermittently, they may not be indicative ofany problem. However, if they appear on a fairly regular basis, it may be asign that the disk subsystem is reaching capacity, and that you may benefitfrom spreading the I/O load over a larger number of controllers or disks,to reduce the length of time you have to wait for the operation tocomplete.
To determine what load the disk subsystem is handling, you can usePerformance Monitor to watch the disk counters appropriate to your system.Monitor the %Disk Time counter for LogicalDisk and/or PhysicalDisk thatcorrespond to your log and data devices; if the value is consistently high,consider ways to spread the I/O load, and look for indications that thedisk subsystem is reaching its limits. You can perform additional finetuning by using the %Disk Read Time and %Disk Write Time counters, whichwould allow finer control in spreading the I/O load across the disks andcontrollers. A large Disk Queue Length value indicates a large backlog inI/O requests, and you may want to investigate adding further disks oradditional controllers to spread the I/O load. You can find additionalinformation about the disk counters in the Windows NT Server documentation,the Windows NT Server Resource Kit, and the following article in theMicrosoft Knowledge Base:
: How to Monitor Disk Performance with Performance Monitor
While these messages result from delay in performing I/O, several SQLServer configuration values have a direct impact on the overall I/Othroughput of the system: max async I/O; max lazywrite I/O; logwrite sleep;and resource timeout.
The 'max async I/O' configuration value determines the maximum number ofoutstanding asynchronous I/O requests that SQL Server will allow,throttling the number of pending requests. When an asynchronous I/O requestis made, Windows NT Server must queue the request, which has an associatedoverhead. While asynchronous I/O requests can increase performance whenused appropriately, a limit can be reached at which point further use mayactually begin to degrade system performance. This limit largely depends onthe controller, driver, and associated disk subsystem. If you persistentlyreceive any of the above errors and have changed this setting from itsdefault value, consider setting it back to its default value.
The 'max lazywrite I/O' parameter is closely related to the 'max async I/O'setting, but specifically throttles the number of outstanding requests madeby the Lazywriter process. The Lazywriter process attempts to flush changedpages from cache to disk so that there is always some free space in datacache. The amount of space that the process attempts to keep free iscontrolled by the 'free buffers' configuration option.
The 'logwrite sleep' configuration setting is an advanced configurationoption that forces an intentional delay before writing log pages to disk.Because the log pages must be physically committed to disk before datamodifications are made, you generally want these writes to be performed asquickly as possible. In certain circumstances, you can enhance performanceby intentionally forcing a delay so that multiple processes can "pack" logrecords on a single page and thus reduce the number of write operations.The amount of time used in this intentional delay is counted as part of thetimeout period for the overall write operation, so a writelog timeout maybe artificially induced by increasing the 'logwrite sleep' configurationsetting. Changes to this configuration value should only be made in acontrolled fashion, to ensure that performance is not hindered.
The 'resource timeout' configuration setting controls the overall timeouton the asynchronous I/O operations. By default, the setting is set to 10(seconds), so any operation that takes longer than 10 seconds to completeresults in a timeout warning. Setting this value too low may cause timeoutsin an unrealistically short time. If you are frequently seeing bufwait orwritelog timeouts, you can increase this setting to reduce the frequencywith which these errors are generated.
Because SQL Server stresses the I/O subsystem so extensively, it is alsoimportant to verify that you are running with the latest versions of thedrive and controller firmware, as well as the latest driver. Also consultyour hardware manufacturer for any diagnostics they may have for stressingthe disk subsystem.
If the above information does not help resolve the issue, you may want toconsider stressing the system in a manner similar to that done by SQLServer. To assist you in doing this, a utility called SQLHDTST has beenwritten. For more information about the utility and locations from which itcan be downloaded, see the following article in the Microsoft KnowledgeBase:
SQL Server Utility Files Available
SQL Server requires systems to support ‘ guaranteed delivery to stable media ’ as outlined under the Microsoft SQL Server Always-On Storage Solution Review program. FoFor more information about the input and output requirements for the SQL Server database engine, click the following article number to view the article in the Microsoft Knowledge Base:
Microsoft SQL Server Database Engine Input/Output Requirements