INF: Disaster Recovery Planning for SQL Server

This article was previously published under Q169039
This article has been archived. It is offered "as is" and will no longer be updated.
SUMMARY
This article provides two examples of simple disaster recovery plans that asite may consider while proactively planning for data recovery from acatastrophic disaster. The first example is targeted for sites havingavailable system maintenance windows; the second example is designed forsites operating on a 24-hour basis.

The intention of this article is to provide a starting point for disasterrecovery planning efforts. This article is not your Disaster Recovery Plan.It is for you to consider in light of your own environment, modifyaccordingly, specify, and verify.
MORE INFORMATION
Suppose a fire occurs and wipes out your 24-hour data center. Are youcertain you can recover? How long will it take you to recover and have yoursystem available? How much data loss can your users tolerate? These shouldbe some of the key concerns of every system administrator (SA) and databaseadministrator (DBA) charged with maintaining invaluable system data.Disaster recovery is the process by which information systems are recoveredin the event of a catastrophe: a natural or manmade disaster such as afire, or technical disaster such as a two-disk failure in a RAID-5 array.Disaster Recovery Planning is the work devoted to preparing all the actionsthat will occur in response to a catastrophic event. Disaster RecoveryAssessment is the simulation of a catastrophic event and/or the evaluationof the Disaster Recovery Plan's capability to deliver the specifiedrecovery needs.

Ideally, the Disaster Recovery Plan should state how long recovery shouldtake, and the final database state the users can expect. For example,"After the acquisition of specified hardware, recovery should be completedin 48 hours and data will be guaranteed only up to the end of the previousweek." It is typically important that management be kept clearly informedof these specifications. Disaster Recovery Assessment should be able tosubstantiate the specification.

A Disaster Recovery Plan can be structured many different ways, and it cancontain many types of information (how to get hardware, who is tocommunicate what, who are the people to be contacted in the event of adisaster, how are they to be contacted, who owns the administration of theplan, and so on). This article is devoted only to proposing some initialavenues for the technical recovery of SQL Server.

The following is an example for sites that do not operate on a 24-hourbasis (that is, sites that have available maintenance windows):

To prepare for disaster, do the following every day (or whenever themaintenance window is):
  1. Shut down SQL Server.
  2. Copy all database device files, preferably to another computer in another building (but beware of network load), and also to a tape device (with the server down, the device files can be copied just like any other files).
  3. Maintain system logs in a secure fashion. Record the directory where all SQL Server files are located, especially the Master.dat file. Keep records of all service packs installed for both Windows NT Server and SQL Server. Keep records of Net-libraries used, the security mode, and the SA password.
  4. Maintain a base functionality script for quickly assessing minimal capability (see the note at the end of this article).
  5. To minimize the amount of data lost during the day, perform database and transaction log dumps while the system is live. See the SQL Server Books Online for more information on dump, load, and recovery procedures.
  6. Assess the following disaster recovery steps ahead of time on another server, and amend the steps as necessary.
To recover after a disaster has occurred, do the following after acquiringsuitable replacement hardware:
  1. Install Windows NT Server and load the appropriate service pack. Verify that appropriate domain functionality exists. For example, verify that file sharing works properly.
  2. Install SQL Server and load the appropriate service pack. Place the master database device in same directory as it was initially installed. Also select the same Net-library, security mode, and SA password as before.
  3. Confirm that SQL Server is running properly. If the Windows NT Server name was changed, use sp_dropserver and sp_addserver to match the Windows NT Server name.
  4. Stop SQL Server.
  5. Move all of the database device files back to their original locations, including the Master.dat file.
  6. Restart the SQL Server.
  7. If any database or transaction logs are available after this time, load them.
  8. Verify the availability of the system. Run a functionality script to ensure adequate operation. Ideally, before the users are released onto the system, time should be provided to run DBCC CHECKDB and NEWALLOC on each database, and DBCC TEXTALL and TEXTALLOC on those databases and tables containing TEXT fields. This is to ensure that the migration process did not alter the files in an undesirable fashion.
  9. After running the DBCC statements shows the database to be consistent and the functionality test script succeeds, allow users to resume.
The following is an example for sites that have no online maintenancewindows and that run seven days a week, 24 hours a day:

To prepare for a disaster, do the following:
  1. Periodically dump all databases, preferably to a disk on another computer in another building (but beware of network load), and also to a tape device. Transaction logs can be handled similarly.
  2. Maintain system logs in a secure fashion. Record the directory where all SQL Server files are located, especially the Master.dat file. Keep records of all service packs installed for both Windows NT Server and SQL Server. Keep records of Net-libraries used, the security mode, and the SA password. Keep records of the specified database options.
  3. Record in scripts ALL size changes for ALL devices and databases. This is crucial to simplify recovery in this situation!
  4. Maintain a base functionality script for quickly assessing minimal capability (see the note at the bottom of this article).
  5. Assess the following disaster recovery steps ahead of time on another server, and amend the steps as necessary.
To recover after a disaster has occurred after acquiring suitable hardware:
  1. Install Windows NT Server and load the appropriate service pack. Verify that appropriate domain functionality exists. For example, verify that file sharing works properly.
  2. Install SQL Server and load the appropriate service pack. Make sure to place the master database device into same directory as before. Also select the same Net-library, security mode, and SA password as before.
  3. Confirm that SQL Server is running properly. If the Windows NT Server name was changed, run sp_dropserver and sp_addserver to match the Windows NT Server name.
  4. Create or alter all devices and databases from the scripts made in step 3 of the previous section above. Databases can be created for LOAD.
  5. After all device files and databases are sized as they were at the time of the last dump, if either the user logon information or the remote server logon information is significant from the dumped master database, proceed with step 5a. Otherwise, if they are not crucial, proceed with step 6.

    1. Stop the SQL Server.
    2. Start the SQL Server in single user mode from the command line "SQLSERVR -c -m".
    3. Load the master database from the last dump of it before the catastrophe occurred.
    4. After success, stop and restart SQL Server normally. Continue with step 6.
  6. Load each of the user databases from the dumped files (and the transaction log dumps too, if appropriate).
  7. Stop and restart the SQL Server.
  8. Verify the availability of the system. If the master database was not reloaded in step 5c, set the database options for each database. Run a functionality script to ensure the adequate operation of SQL Server. Ideally, before the users are released onto the system, time should be provided to run DBCC CHECKDB and NEWALLOC on each database, and DBCC TEXTALL and TEXTALLOC on those databases and tables containing TEXT fields. This is to ensure that the migration process did not alter the files in an undesirable fashion.
  9. After running the DBCC statements shows the database to be consistent and the functionality test script succeeds, allow users to resume.
Disaster Recovery Assessment provides the verification of the plan, and isachieved by getting sufficient hardware, providing the documented disasterrecovery guidelines, and having a backup SA or DBA (someone who is notinvolved with plan development) recover the system on this computer.Perform periodic Disaster Recovery Assessments to verify the vitality ofthe current Disaster Recovery Plan.

If your data is valuable, the importance of Disaster Recovery Assessmentcannot be overstated. What is the business risk if you cannot get your databack? What is the cost for each hour's delay in getting your system back upand running? This is not a situation to assume that your data is quicklyrecoverable; verify it! Understand the steps very thoroughly ahead of time,and you will minimize the stress and uncertainty imposed by thecircumstances of some future catastrophe.

This article was written as an expansion to the Database Recovery sectionon page 48 of the Microsoft SQL Server 6.5 Deployment Guide (found on theWorld Wide Web at http://www.microsoft.com/sql/deploy.htm). Additionalinformation about DUMP LOAD SQLSERVR master database can be found in theSQL Server Books Online and in the Microsoft Knowledge Base.

NOTE: A "base functionality script" is a batch of code that can be used toquickly demonstrate the successful functioning of the database from aspecific application's perspective. Most commonly this is a .sql file withbatched SQL commands run into the server from ISQL. For other applications,a .bat file is more appropriate because it can contain BCP and ISQLcommands. This base functionality script is very application-specific, andcan take many different forms. For example, on a Decision Support/Reportingsystem, the script may merely be a copy of a couple of your key reportingqueries; for an online transaction processing (OLTP) application it may bethe execution of a batch of stored procedures to execute INSERT, UPDATE,and DELETE statements. The goal is to confirm, from a gross perspective,that everything is working as intended. The base functionality scriptprovides a nice tool for the SA or DBA to be able to see that the databaseis back in a viable state, without depending on the end users forverification.
Properties

Article ID: 169039 - Last Review: 12/04/2015 17:05:20 - Revision: 3.0

Microsoft SQL Server 4.21a Standard Edition, Microsoft SQL Server 6.0 Standard Edition, Microsoft SQL Server 6.5 Standard Edition

  • kbnosurvey kbarchive kbenv kbhowto kbusage KB169039
Feedback