|
Do you find the Support WebCast transcripts helpful? Let us know!
Microsoft Support WebCast
Microsoft Systems Management Server 2003: Backup and Recovery
October 15, 2002
Note This document is based on the original spoken WebCast transcript. It has been edited for clarity.
Wally Mead: Over the past year we've discussed a lot of features that Microsoft® SMS 2003 provides to your environment. We've also discussed deployment and upgrade information. This month we're going to concentrate on how you can use SMS 2003 features to ensure a good, quality backup is available in the event a recovery is required. Then we'll talk about the improvements that SMS 2003 makes in the area of recovery.
So our topic today is going to be the SMS 2003 product, specifically the areas of backup and recovery. How do you ensure that your site is backed up properly, and what tools and methods are available through SMS 2003 for you to perform an efficient and proper site recovery?
Here's what we'll cover in today's session (slide 2). The first thing we're going to look at is an overview of the backup and recovery process in SMS 2003. So basically, what's involved in the backup and recovery process? Then we'll talk about SMS site recovery rules. What is site recovery, and why is it important? We'll look at understanding SMS site recovery. Why is it not as simple to recover an SMS site as some people think? We'll look at the supported recovery scenarios and supported recovery configurations. Can you recover everything in regard to SMS? Are all scenarios and configurations supported, or are there things that can't be handled with the available tools that we'll talk about here?
Then we'll spend a little bit of time talking about SMS serial numbers. This is really what makes recovery essential to prevent data corruption. If you don't resynchronize your serial numbers in a site hierarchy during your site recovery process, you can end up corrupting data down at your child sites, and you certainly don't want that to happen. So we'll spend a few minutes talking about serial numbers.
Then we'll go through the backup and recovery plan. We'll just talk about a few different phases in the backup and recovery plan, what you want to do to ensure that you're properly backing up your site, and that you know how to perform a site recovery in the event that you need to do one. Last, in the second half of the presentation, we'll talk about the site recovery tools. What tools does SMS 2003 provide to your environment to allow you to efficiently back up your site, as well as recover, if necessary?
So just what is SMS 2003 backup and recovery? Backup and recovery (slide 3) is the process of ensuring that your SMS data is available in the event you need to perform a recovery. The recovery process is the process of restoring the data that you've backed up and ensuring that your site operates as closely as possible to how it operated before the failure.
Essentially, the easiest way to perform a backup of SMS 2003 is to use the Automated Backup Task. Within your SMS 2.0 admin console, there is a Database Maintenance Task, and SMS 2003 it's called Site Maintenance Task, but there is a task that will back up your SMS site.
What it backs up is the SMS site database stored in SQL Server™. Now for SMS 2.0, that can also back up your software metering database, if you're using software metering. It backs up the site server directories, so that you have all the data that's stored in SMS in the file structure on the site server backup. It backs up registry configuration information as it relates to SMS. It doesn't back up the entire registry, but the registry keys that are appropriate for SMS and the site server. And it also runs some utilities that will generate operating system configuration information.
This is important information if you need to rebuild the site server computer. So you will need to know how you have the drive separated, what drive is SMS installed on, or where is SQL installed. We reference all that information for you, and the configuration information is generated by some of these tasks.
The automated backup task is included in both SMS 2.0 and SMS 2003. Again, in SMS 2.0, select Site Settings, then select Database Maintenance, and then select Task. For SMS 2003, under Site Settings you select Site Maintenance and then select Task. Then you have the task called Backup SMS Site Server. We'll look at that more a little bit later.
Recovery is a lot easier in SMS 2003 than it was in SMS 2.0. In SMS 2.0, we didn't really provide any tools within the product to perform any kind of a site recovery. So in essence, if you wanted to recover, you were flying by the seat of your pants, which is not necessarily a good scenario in a lot of cases, because it is a very complex process. Or you had to go to Microsoft Product Support Services and get help from them to perform a proper site recovery. They had a document that they would sometimes send to customers to help them step through the recovery process, and they could help you out with that site recovery process. Again, the biggest thing was synchronizing or repairing your site. So we'll look at synchronization of the serial numbers.
Then later in the product cycle we created a new Web site called the Maintenance and Recovery Web site. The Maintenance and Recovery Web site was very cool. It had a lot of good information, but it only gave you a list of tasks that you had to perform on your own. It wouldn't automate any of the recovery processes. So it was good for finding out what needed to be done and how you had to go about doing so, but it did not provide you any automation on those processes. So that's just a quick overview of what the backup and recovery process was.
Now let's talk about site recovery. What does it mean? A site recovery (slide 4) begins when an SMS site code or site server name is reused in the hierarchy. In other words, you've done a reinstallation of an SMS site as a parent site or as a child site. So any time you have to reinstall your SMS site and replace any data from your backup process, that would constitute a site recovery.
The recovery is specific to that hierarchy. In other words, you don't need to worry about site recovery if you have a standalone site and you have a separate SMS site hierarchy, because you won't have that site code or that site server name using the hierarchy. So it's specific to the hierarchy that you're looking at.
So it is extremely important. If you don't perform your site recovery and perform a proper repair and synchronization of your data, then what is very likely to happen is you will cause data corruption or cause loss of operations in your SMS environment. SMS data, which we'll talk about in the next couple of slides, is stored in numerous, different locations. This is not just stored in the SQL Server database.
You need to go through more steps than just restoring a SQL Server database, because data is synchronized, or its serial numbers are stored in the registry, as well as in the file structure. If you want to prevent data corruption or loss of data at your child sites or at the site you're recovering, then you need to make sure to go through a proper site recovery and resynchronize that data. We'll look at what that means as we go through the presentation for you.
So what is it about SMS that makes site recovery difficult? SMS has a lot of different aspects to it that affect recovery (slide 5). The biggest scenario, and the biggest reason why recovery is more difficult in SMS than in some different environments, is that SMS uses distributed data and tasks. The SMS data is not simply stored in the SQL Server database. It's also stored in the registry and in the file structure of your site server. So there is data located in different places in the SMS environment that is stored to handle the information generated throughout your environment.
This gives you a couple of advantages. One of the advantages is you're distributing your tasks so that you have process isolation. So if there's a problem in one task, it isn't necessarily going to affect another task, because you're distributing your processes. Also it allows for better scalability, because it allows you to separate your tasks on different physical computers. Those different physical devices can be used to segment your operations, giving you multiple servers, which can give you better scalability. So your SMS sites and hierarchies can be larger because you are distributing these processes.
SMS also uses multiple user accounts. The reason we do this, as we talked before in other WebCasts, is for security. We isolate each process using different user accounts so that you have more security in your environment. For example, the SMS Service Account can be used for multiple, different tasks. It can be used for running the site server services. It can be used for pushing out the SMS client software to your client computers. It can be used for communicating with, configuring, and updating your site systems, like logon points, client access points, distribution points, and so on.
The SMS Service Account is a domain admin account by default. So it's an extremely powerful account. It's generally advantageous to not use that account across the wire whenever possible. So you can create optional accounts, which can improve your security, but this also means more administration for you.
However, the advantage we have with those multiple accounts is that whenever we have a remote computer accessing the network, we use a domain user account to transfer data across the wire. Again, it improves your security; however, it does provide additional accounts that you then may need to manage or maintain in your environment.
In SMS 2.0 hierarchies, secondary sites are not backed up. We don't have a procedure or task for you to go through and automatically back up a secondary site. The Backup SMS Site Server task that we referenced before, and that we'll talk about again in a couple slides, does not work for secondary sites in SMS 2.0. So if you wanted to back up your SMS 2.0 secondary site, you had to go through a manual process.
Now most people honestly didn't back up their secondary sites. Because if you think about what secondary sites are, there is no SQL Server database involved, so it's a flat file structure as far as storing data locally, and then some of the registry. Generally your secondary sites have a very small number of clients, anywhere from a small handful, a half dozen, maybe on up to a thousand. Generally people thought it was easier, because of the fact that there's no SQL database to do your recovery from, to just reinstall their secondary site server if it ever did fail. Now this does change a little bit in SMS 2003, which we'll talk about.
Now that we know why the site recovery is difficult, what are some of the key strategies? First off, a proper site recovery requires a snapshot backup because of the fact that data is stored in multiple, different locations for your SMS site. It's stored in a SQL Server database. It's stored in the site server's registry. It's stored in the site server's directory structure, in the file structure.
You can't just back up one of those, for example the SQL Server database, and expect that you can recover your site just by restoring that one entity. You have to restore all the data at the same time, which requires a backup snapshot. You want to have all three of those data stores backed up at the same time. That way when you perform a recovery, you have consistent data through all of those. So when you perform your recovery, you restore all three of those data sources back to your site, and that will give you the ability to perform a proper recovery.
You can actually perform a recovery without a backup. It is possible to do so. However, you are going to have data loss. Data will be lost if you have a recovery scenario where you do not have a backup or your backup didn't work, or the media that you backed up to was corrupted, or whatever the case is. You can perform a recovery, but it is going to involve data loss, which means it's going to require administrative intervention on your part to re-create that data, so that the clients can pull off the correct advertisements, look at the correct packages, be members of the correct collections, and so on.
As mentioned before, and we'll mention it again numerous times throughout the presentation, serial numbers must be resynchronized in recoveries involving site hierarchies. So any time you're recovering a site in the site hierarchy, you need to make sure that your serial numbers are resynchronized. If you don't, the data will become corrupted, or you will lose data at your child sites.
This is the biggest gotcha that people miss in their recovery scenarios. It is extremely important, and we will spend some time on this throughout the presentation. I'll mention it numerous times. If you don't perform a proper resynchronization, then you will have corrupted data or you will lose data at your child sites.
Are all SMS site scenarios or configurations supported? We're not going to support every single configuration you might ever come up with, but we have a great list to start with. This slide (slide 6) talks about the supported recovery scenarios. We have scenarios listed here, and you can find these up on the Web site that we'll talk about when we get into the tools section of the presentation a little bit later on. First is the physical computer stops responding. In other words, the site server is dead or the SQL Server computer is dead. The physical computer is not functioning any longer and you need to go through a recovery process.
Next, the drive containing the operating system, SQL Server, or SMS stops responding. So the physical computer is still there, it still responds. However, with one of those three things, the OS, SQL Server, or SMS, the drive has had a failure. So you need to go through a drive replacement process.
Next is the operating system just fails to respond properly, so it's become corrupted and needs to be restored. It's an OS failure. The file system becomes corrupted. Let's say your SMS site server's file structure has been corrupted and you can no longer properly function as a site server. You need to perform a drive repair, possibly a reformat, and then restore your information, because you've had drive corruption.
Next is SQL Server stops responding and must be restored. SQL Server, as you know, is extremely important to an SMS environment, and if SQL Server is not functioning properly, then in essence SMS is dead. So you have to go through a recovery process if SQL fails. Or maybe the SQL Server itself is functioning properly, but your SQL Server database that SMS is using, called the SMS Site Database, has become corrupted. You had a SQL database failure. In all these different scenarios, we can go through the process and help you out with your data recovery.
How about different configurations? What different hardware and software installation configurations are supported? Again, we probably don't do everything, but we have a really good story to offer and a lot of good steps here to start with (slide 7). First off, we can support recovery of SQL Server 6.5, SQL Server 7.0, or SQL Server 2000. As you know, SQL Server 6.5 is not supported by SMS 2003, so SQL Server 6.5 would only be a supported recovery scenario or configuration for SMS 2.0.
The SQL Server can be local to the site server or it can be on a separate computer. That would be your choice, where you have that SQL Server computer: local to the site server or on a separate computer. We usually recommend that the SQL Server and the site server be on the same physical computer, because there will be a lot of interaction between the site server and the SQL Server computer, so having them local, on the same box, will give you great performance. However, if your hardware is not adequate enough to efficiently handle SQL and SMS, as well as your operating system — through all the different packages, advertisements, collections, all the different data you need to distribute, all the clients you have, all the inventory collections that you have scheduled, and so on — then you may need to separate those out, which you can do.
The SMS Provider can be either local or remote to the SQL Server computer. Usually we recommend putting the SMS Provider on the SQL Server computer, unless you know that by adding it it's going to affect the SQL Server computer that is used by other applications or other data stores, other than SMS. You can do a recovery with no backup, an old backup, or a current backup. In essence, the more recent your backup is, the less administration you have to go through to recover any data that's going to be lost from an old backup or having no backup at all.
You can recover secondary sites or primary sites, with or without a parent site. So a secondary site can be recovered, or you can recover a primary standalone site, or one that has a parent site. You can recover primary sites and leave the no-child sites; in other words, I'm the lowest in the hierarchy you go. Maybe I'm a central site; maybe I'm a child of a primary, but I have no child sites. Or you can recover with any mix of secondary and/or primary child sites.
Basically, you can recover any site within your hierarchy, provided you fit within one of the scenarios and one of the configurations that we talked about. This covers all the SQL versions in SMS sites in the hierarchy with or without a backup, as far as your supported configuration and scenarios.
Now, we've mentioned SMS serial numbers a couple times in the presentation. How are they used and where are they stored? So what is the big thing about these SMS serial numbers, and what problems do they cause if you don't properly synchronize them?
Serial numbers (slide 8), as a general rule, are used to designate the next object ID when you create a new SMS object. The most common ones you'd be familiar with would be collections, packages, or advertisements. We need to know what the next object ID is for that. How do I create objects without corrupting data by using the same object ID, which is used internally over and over again? That's what the next object ID or the serial numbers are used for. They designate the next value you want to use when you create an object, so that you don't corrupt data.
This is not an object ID that you have to be concerned about when you create something. When you go to the admin console and create a collection, we don't ask you, what is the ID you want to use to create this? We ask you for your name, a query rule, your direct membership rule, and so on, but we don't ask you the serial number you want to use. It's used internally by the system. So whenever you go out to the admin console and you create a collection, we go to SQL Server to find out what the next collection ID is, so that we know internally what ID to assign appropriately, so that we don't cause corruption.
Some of you may have seen these object IDs in log files, in the file structure, or in the admin console. There's a switch you can use to display your object IDs. In fact, if you're familiar with the SMS 2003 admin console, you'll notice (and I've mentioned this in previous WebCasts) that we automatically expose those object IDs to you in the admin console. It will show your site code, and then the last five digits will be unique. That will be the serial number.
These must be synchronized upon a recovery, otherwise again you may have data loss or corruption may occur. For example, let's take the case of packages. If your package ID is SMS, let's say your site code is SMS00001, and that was the last package you created, which happened to be your first package. Let's say you do a site recovery, but you don't have a backup. If you don't resynchronize your serial numbers, we're going to start, upon a reinstallation, with the next package ID of 0.
So when you create your first package, it's going to be SMS00001, which would basically wipe out the package that you had earlier, down at either your child sites or down at the clients. They'll see this as a brand new package. Or maybe it will be a different package from what they thought they had. So you can have data corruption, or maybe just total data loss.
Serial numbers are stored in different locations, and we'll go through those on the rest of this slide. They're stored in the SMS site database in SQL Server. If you want to see them, you can use SQL Query Analyzer, change to the SMS database, and you can run the query that I have listed in the sub-bullet, Select * from NextIds, and that will go through and show you all the IDs and the ID values that are stored in the SMS site database.
The important ones for you to worry about for resynchronization are going to be your collection IDs, your offer ID (and offer internally is the name for an advertisement, so your advertisement ID, which shows up as next offer ID), your next package ID, and your next SDK delta site control file ID. Site control file is abbreviated to SCF. So those would be the four that are really important for you to ensure that you have synchronized properly in a recovery scenario. You have the collection ID, the offer (which again is the advertisement ID), the package ID, and your next site control file ID.
You can run that query and it will display a list of all those values for you, including others, which you don't need to worry about nearly as much for data synchronization. What is advantageous for you is to run this query and then print out the result list. Take the table this generates for you and print that out so you have a hard copy of it. You may want to do that every single time you do your backup, so you have that data available if you ever do need to restore, so that you have the appropriate values for the appropriate backup snapshot that you've created.
We also have some serial numbers stored in the SMS registry. The three different values with the three different components I have listed — Replication Manager, Discovery Data Manager, and Site Control Manager — all have registry entries that are used for controlling the next transaction ID or the next serial number. For example, with the Replication Manager, this is the ID or the transaction ID to be assigned to the next set of data that needs to be sent down to a child site or sent up to a parent site.
What you need to do is ensure that the value listed in the registry is higher than the value that any child site is expecting. If the child site is expecting a value of let's say 50, and you send it a value of 30, the child site is going to ignore your data, because it's going to say this is outdated data. I'm expecting a serial number of 50 and you're sending me a serial number of 30. It's going to just ignore the data.
Discovery Data Manager has a serial number as well. This is the next discovery record to be created and added to the database. You need to synchronize that; otherwise, if a child site or even your local site generates discovery data, and the serial number is too low, it's going to say, "Wait a minute; I've already used this," and it can throw away your data.
Site Control Manager has the site control file ID as well, the last site control file that I used locally. And it needs to be, again, set so that your child sites are expecting a number larger than the last one that was used — at least not smaller, otherwise it will be invalid.
You can find all these in the registry, under HKEY_LOCAL_MACHINE\Software\Microsoft\SMS\Components, and then under the Components key you'll see Replication Manager, Discovery Data Manager, and Site Control Manager. You can find the transaction ID or serial number for each of those listed in the appropriate location.
The last place you need to be concerned about in a site hierarchy environment is your installation directory. We store the Replication Manager history file there. The Replication Manager history file keeps track of serial numbers for transaction ID objects that have been received from child sites or from a parent site — received from another SMS site in the hierarchy.
If you look at that file structure, go to Sms\Inboxes\Replmgr.box\History and you'll find a file that shows the site code of your destination site, or in this case, of the originating site. It will have a .trs extension on it. If you open that file up with Notepad, it will tell you the type of object that it has received from the original site and what the transaction ID was for that object.
This is telling me that if I look at my site right in front of me, I have a secondary site as a child, and the site control file that I last received from it was serial number 3. If I then have to recover my child site, I need to make sure that in the child site's registry for the Replication Manager transaction ID, I make sure that it's set for something above 3, because my parent site is saying the last site control file I received from the child site was ID 3. If it sends me a 1, 2, or 3 again I'm going to ignore them, because of the fact that I've already seen that number. So this is outdated information. I'm expecting something above a 3. So you can set it to any value above 3 that you want.
If you look at our Web site, which we'll talk about later on, we recommend that in some of those cases you add maybe 1,000 to whatever the serial number is. That way you know you're setting it well above any transactions or serial numbers that may have been used since the site failed. We'll look at that a little bit later on.
But you need to make sure that you coordinate all these values from these three different sources to make sure that your data does not become corrupted in your environment, or get lost. So it's very important for you to understand these serial numbers, know where you can find the values for them, know where you need to go to set those values, and set those values appropriately.
Now that we've talked about SMS backup and why recovery is important, how do you go about ensuring that your backup plan will work? Let's talk about a backup and recovery plan. What I've listed for you on the slide (slide 9) are three different phases to go through to ensure that you have the ability to perform a good, valid backup, and that you have a recovery plan that can be efficiently implemented when you do need to perform a recovery. You may have other phases that you want to go through in your own environment, and that's fine. These are just some that are commonly referred to and commonly accepted as standard phases.
The first is a planning phase. You want to cover all your potential issues. You want to know what needs to be backed up, and ensure you have that data backed up. Next is the backup phase. You want to verify that your backup plan is working as expected. So are you backing up the data that you expected to back up? Is it in a format that you can use to perform a recovery? Last is the test recovery phase. Verify that you can successfully perform a recovery if it is needed. We'll go through, in the next three slides, each of these different phases in a little more detail for you.
During the planning phase (slide 10) you need to go out and document a lot of different aspects of your SMS environment. First off, you want to document your SMS site hierarchy structure and site codes. So in other words, what was the hierarchy? Which sites were child sites to which parent sites? What does the hierarchy look like? What senders were used? What addresses were used for those senders? And what accounts were used for those addresses so that, again, in the event you need to do a recovery, you can recover as quickly as possible?
You want to document any custom accounts you created. So in SMS 2.0 and in SMS 2003, in a standard security environment, there are a lot of optional accounts you can create and utilize to improve your security or just implement things differently in your environment. You need to make sure that you know what those accounts are, so in the event you need to recover, you can recover those and you won't have a loss of functionality, or at least your functionality will be resumed as quickly as possible.
You want to document your domain structure and trust relationships. What kind of authentication are you using? How are your sites communicating? What accounts have been added to what groups through the trust relationship? What customizations have you made to SMS? Maybe you've modified your logon script, the Smsls.bat file. Maybe you've modified your Smsdef.mof file for hardware inventory collection. Maybe you've modified your backup site control file, and so on. Document and save any files that you modified or customized through the SMS environment.
You need to make sure that you document your serial numbers — so the NextIds table from SQL, your Replication Manager history file, your registry entries that we talked about, those things that we talked about on the two previous slides — you want to make sure that you have those documented so that you can, again, set them correctly, or at least know how you need to set them.
The administrator needs to create an SMS client connection account. There's a default SMS client connection account created. It's with the syntax SMSClient_sitecode. If you have to restore your site, if you have to reinstall your site, that account is going to be re-created with a new password.
Your clients that exist in the environment now are not going to be reinstalled. They're going to use the same account, but with the original password. So you're going to wind up orphaning those clients, because they're going to know SMS client with password 1, but the site was reinstalled, so it's going to have the account SMS client with password 2. And your clients won't be able to talk to the CAPs or distribution points, or whatever's necessary.
What you need to do is make sure that you've created an admin-controlled SMS client connection account. You create it in your site. Then when you need to reinstall your site using the same site code, you just re-create that same account inside your domain, as well as inside your SMS environment, and your clients will then be able to use the account you created to communicate with a CAP. At that point they can pull down the new SMS client connection account and password that is created automatically when you install your SMS environment. Again, if you don't do that, you're going to wind up orphaning your clients.
You can recover from orphaned clients just by connecting to a logon point and running Smsls.bat or running Smsman.exe, but again, it's something you have to do manually, and touch each client. If you just create your admin-created SMS client connection account, you'll prevent that orphan from occurring at all, and it will be an automated process for you.
The next phase is the backup phase (slide 11). This is where you want to use the Backup SMS Site Server task in the site server. In the site server's Site Maintenance settings or Database Maintenance Task, you want to enable the Backup SMS Site Server task. This is an automated procedure to safely back up your site server. It's going to copy your important site server files to an export location, so you can specify where that is, probably the local hard drive, and then you want to move the data that's been backed up to some other offline media for archival.
This process differs a little bit between SMS 2.0 and SMS 2003. In SMS 2.0, the Backup SMS Site Server task would copy the entire site server's directory structure, the SMS tree, including all data. That takes a lot of disk space, because it's backing up all the binaries that are involved with installing a site server. That backup may be a couple hundred megabytes (MB) or more. In training classes, we tell people they need to look to make sure they have around 250 MB of disk space on whatever drive they're exporting to, because the backup task is going to take all the binaries.
In SMS 2003, we no longer copy the binaries, because you're probably going to reinstall SMS anyway, so you'll get those binaries again. Then we'll just copy the important server data files that are in the structures. We'll copy your log files; we'll copy any data you have in your inboxes, but we don't back up the binaries. So that makes the backup a lot faster, because you're not backing up 180 MB (or whatever it is) of binaries, and it saves space. The time required for backup is going to be reduced, as well as the disk space requirements.
We do back up the SMS-related registry information. We back up the SQL Server database. We perform a backup from SQL Server of the site database. And we create a log of all the server configuration information. We run a couple utilities to document your computer's configuration. In the event you need to recover the server computer or replace it, you know what drive SMS was installed on or what drive SQL Server was installed on, so you can re-create things as closely as possible.
You can also specify the backup utility to run optional procedures after backup. You can create a batch file called Afterbackup.bat, where you can enter other procedures you want us to handle automatically for you. Maybe it's to kick off NT Backup to then move the data we generate and the tree that you export to off to some tape environment. So you can do that.
SMS 2003 supports backup of your secondary sites. So you can back up your secondary sites through the admin console. That was not supported in SMS 2.0, but it is for SMS 2003. Again, after you've done your backup, you want to archive your information off the site server to some other media type, like Zip or Jaz drives, or whatever you need to do, like burn them on CDs, and then store them offsite in the event of a disaster.
Optionally you can back up your distribution points. If you had tape or other media available, you could use that to back up your distribution points. We state that's an optional process, because SMS can redeploy packages automatically inside the admin console. So after you reinstall your site, you can use SMS and tell it to "Add the distribution point to this package," and it will deploy it automatically.
However, that's an administrative process that you'd have to go through, plus the network traffic required to push the packages back out to those resources, back out to distribution points. If you had a tape of the distribution point as a backup, you could just restore the tape and save the network traffic and the admin console activity.
There's no real need to back up any other site systems. So we're backing up the SQL or backing up the SMS site server. All the other site systems, like logon points, CAPs, whatever, we can recover very easily within the admin console. There is not much you need to do there.
You can control a little bit of this backup process by customizing the SMS backup control file. There is a file called Smsbkup.ctl in Site Server\Inboxes that you can use to customize your backup. You can specify a longer sleep cycle. So after you tell SMS to stop services, it's going to sleep for 30 seconds before it goes out and starts doing any other work. You can specify for that interval to be longer if you need to have other services stopped. You can add services to be stopped. You can tell it to sleep longer.
You can add files in directories to be backed up. We have a core set of items that we're going to back up. You can specify your own custom locations that you want backed up as well. You can add different registry entries to be backed up. If you have us stop services for you automatically, you might want to specify to have us restart those services at the end of the backup process. You can customize that backup control file to do additional tasks that you deem necessary in your environment.
The last step in the phase is the test recovery phase (slide 12). Here's where you want to verify that your backup plan worked. Did you back up all the data that was necessary to do a site recovery? All the data that you expected to be backed up, was it backed up properly? What you really need to do to verify this is practice a site recovery. So take the backup from your production environment that was created, take it over to your test lab, and practice the recovery there, so that you're ensuring the data is valid, to see that you have everything that you needed, and that you have procedures worked out to perform a recovery as quickly as possible. So if the time ever comes where you need to perform a recovery, you can perform the recovery as efficiently as possible to prevent as much downtime as possible.
There are generally three steps in a recovery process. First is the rebuild process, where you're rebuilding your computers; you're rebuilding the operating system; you're reinstalling SQL, and you're reinstalling SMS. You're physically rebuilding the box, as well as the software required to support SMS — whether it's the operating system, whether it's SQL, or whether it's SMS itself.
Then you have the restore phase. The restore phase is where you restore the data to the rebuilt server. You're restoring your registry entries. You're restoring the file structure. You're restoring the SQL database. You're restoring the SMS data back to the reinstalled SMS environment, or at least a rebuilt server.
The last phase is the repair phase. The repair phase is where you go through the process of resynchronizing your site back in the hierarchy. Make sure you have all your transaction IDs set properly. Make sure that you've recovered any packages, collections, and advertisements that were lost after your last backup.
Restoring recovery data is the last step in the standalone site scenario. In a standalone site, you don't have a hierarchy to worry about, so you don't need to worry about the repair phase, because you're rebuilding your entire site from scratch, anyway. So you just go through the rebuild phase and the restore phase to get back what data you can. You don't have to worry about resynchronization, because you don't have a hierarchy to work with.
Repair is necessary for site hierarchies, again, to ensure that any new packages, collections, or advertisements that you create, as well as any updates to your site control file, aren't lost and aren't corrupted in your environments.
All this sounds pretty complicated. Aren't there any tools to assist with the backup and recovery process? Well, the backup part is easy. We've talked about that. We have the automated task in the admin console to go through a very quick and efficient backup process. Recovery is the harder process. So here are some of the tools (slide 13) that we have available to help you with the recovery. I have slides on each of these coming up; in fact I have multiple slides on the Site Recovery Expert and Site Repair Wizard.
First off, we have the SMS Maintenance and Recovery Web site. This contains a ton of documentation that's necessary for your planning processes, as well as to help you ensure that you have proper backup procedures in place. It contains documents for things such as swapping your server hardware. If you need to move your SMS site server from one box that's kind of old to a brand new really ripping box, now it has documentation there for how to go about doing that. And it contains the current version of the SMS Site Recovery Expert. This is the public Web site that gives you information that you can use today for doing some backup and recovery.
Now the SMS Site Recovery Expert, which is contained at the Web site I'll show you on the next slide, is an interactive Web-based application. It interactively queries the administrator for some site information. After you've given it your site information, it displays a list of tasks or steps that you need to go through to properly rebuild, restore, and repair your site hierarchy. You tell it about your site information, about your failure, your configuration information, and then it tells you what needs to be done to fix it.
This is included with SMS 2003, and it runs locally. For the current version, you have to go up to our Web site and you run it from the Web. With SMS 2003, the Site Recovery Expert is included. You can install it locally and run it locally without ever having any requirements for Internet connectivity. We'll talk about that a little bit later on.
The last tool is the SMS Site Repair Wizard. This helps with repairing a failed site that is a member of a site hierarchy. So it can actually help you perform the restore, as well as perform the repair, which again is resynchronizing your serial numbers and recovering any orphaned data. This is included, again, with SMS 2003. So with SMS 2003, not only do you have the Site Recovery Expert to see what needs to be done, but you have the Site Repair Wizard that can help automate a lot of the recovery processes for you. And we'll look at that at the end of the presentation.
First off, let's look at the SMS Maintenance and Recovery Web site (slide 14). This is the place to go for any information regarding backup and recovery scenarios for SMS 2.0. Today you can go to the maintenance and recovery Web site, and I have the URL at the bottom of the slide (http://www.microsoft.com/smserver/techinfo/administration/20/recovery/default.asp). You can go there now to look at planning documentation, so what do you need to do to prepare yourself for a recovery. You can look at the supported scenarios and configurations, basically what I covered earlier in a couple of different slides, but we have documentation there for you.
We have document on swapping server hardware. If you need to move SMS from one server computer to another, how can you do that? We have a fairly recent document there on moving SMS between domains. This is a very common request, now that people are implementing Active Directory® and going through the process of collapsing all the different domains that they were using, the resource domains in the Windows NT® environment, down to their Active Directory environment. So that's available up there. And there's a FAQ, a frequently asked questions document.
This site also contains the SMS Site Recovery Expert. Again, you can tell us about your site interactively. You're not providing any sensitive data at all. You're not giving us a site code. You're not giving us any IP addresses or computer names. All you're saying is "I have an SMS 2.0 site. My primary site failed. My CAP failed. I have a backup. Here's my site hierarchy. I've made changes to it. I was a parent site and I've added some different child sites. Here are the agents I was using in my site." You feed that type of information into the expert, and then it produces a list of tasks that you have to go through to rebuild, restore, and repair your site.
Nothing is sent up to Microsoft that would be sensitive data in regard to your site. It's purely information about the general configuration of your site, so that we know what tasks to tell you to go through. Some people are concerned about the fact that they have to go up to our Web site to run the utility. Again, nothing sensitive is being given to us, so you don't need to worry about that.
That's one of the reasons why we've included this in the SMS 2003 product itself, so you don't have to go to the Web. Or sometimes in your SMS environment you don't have Internet connectivity. You don't have to run this from the site server or anything. You go to any PC that can access our Web site, produce this task list, print it out, and then go back to your site server and perform your recovery.
The information on the maintenance and recovery Web site is only for SMS 2.0. SMS 2003 has different procedures and has the data included with the product. Again, the URL for the maintenance and recovery Web site is listed for you at the bottom of slide 14.
That's for SMS 2.0. Now let's concentrate on SMS 2003 (slide 15). How does SMS 2003 change the Site Recovery Expert? Well it's included with the product. You don't have to connect to the SMS Maintenance and Recovery Web site anymore. You don't have to go to the Internet at all. You can run it directly from your SMS installation. It's included with the product; however, you do have to install it manually. It's not installed automatically. I'll show you, in a couple of slides, the screen shot for the installation. It's a very, very simple installation process.
It includes all the questions, tasks, and procedures for supporting both SMS 2003 as well as SMS 2.0 sites. Because your SMS 2003 site can have SMS 2.0 child sites, we include the procedures for recovering both. Again, you have to install this manually, and I'll show you the screen shot next. It requires Windows® 2000 Service Pack 2 or later, as does everything in SMS 2003, as far as site systems go, and it requires Internet Information Services (IIS) 5.0 or later. This is an Internet-based application just running locally, so it requires IIS.
Again, it runs locally, not from the Internet, which is good news. To run the Recovery Expert, it requires Internet Explorer 5.5 or later. So you can run it on a computer remotely from what we call the Recovery Expert or recovery point. It doesn't have to be the site server, but you do need the local intranet to access it. So you do need Internet Explorer. Again, it includes the tasks for hardware swapping directly in the Recovery Expert. One of the questions we ask you is, are you doing this to swap your hardware? So the tasks are there. You don't have to go get the document and download it.
It creates a local Web site for you, for your intranet. The URL is whatever computer you installed this on, so http://server/SMSRecoveryExpert/. You just go there; it launches the expert, and now you can run through and supply your information locally. Then it will produce the task list for you.
There's also the possibility of printing out an entire task list. So instead of just running the expert and seeing what's necessary for your specific configuration, you can print out a list of all recovery tasks so that you have a list of everything that could ever be necessary or required of you to do in a recovery. You could use that for documentation and making sure that you've tested your procedures and that you know how to do all the different things that may be required of you.
Here's the screen shot of the installation (slide 16). This is the standard splash screen that you would see when you install or you put your SMS 2003 beta CD in your CD drive; it pops up an autorun screen or a splash screen. On the lower-left of the screen, just click Recovery Expert. It's an SMS Installer installation. It takes probably less than a minute to install, and then you have your recovery site up and running. It's a very, very simple install process.
The Recovery Expert is going to prompt for a number of different configuration options (slide 17). I'm not going through and listing every single option for you here. I'm just listing some of the highlights or some of the main things that you would be looking up. It prompts you for what version of SMS needs to be recovered, SMS 2003 or SMS 2.0. The steps are different. The other set of questions we'll ask are going to be different, depending upon the version you specify.
We ask you which SMS site system failed and requires a recovery. Is it the site server? Is it a client access point? Is it a management point? Is it a software-metering server, again, depending upon the version of the software you're recovering? Is it a primary or secondary site that's failed? We ask you about your hierarchy information. Are you a parent site or are you just a child site? If you are a parent site, what types of child sites do you have: secondary sites or primary child sites?
We ask you, do you have a current backup, and what changes did you make to your site after the last backup, before the site failed? We ask you what type of changes you might have made. We ask you if you're going to use the Site Repair Wizard. Are you going to use the Site Repair Wizard to automate some of these processes? The reason we ask you that is because when we produce your task list, we'll tell you directly which tasks can be accomplished automatically by the wizard, or which ones you can do manually.
We'll ask you a series of questions. There are four wizard pages that we'll go through and we'll ask you a set of questions that are appropriate for your version. Here's an example of the first page of the Site Recovery Expert (slide 18). You can see this is from SMS 2003. You can see the actual version of the sites being recovered, SMS 2003 or SMS 2.0. You can see I've selected SMS 2003. So if you look at the next section, where it asks what component of your site has failed, you see that Logon Point and Software Metering Server are unavailable, because those don't apply to an SMS 2003 site. If you were to select SMS 2.0, your options for Management Point, Server Locator Point, and Reporting Point would be unavailable, because, in fact, those aren't valid for SMS 2.0 sites.
So we'll ask you a set of questions like this in these different wizard pages. You select the appropriate option; you click the Submit button at the lower left, and it goes on to the next page. We'll go through the four different wizard pages, asking you all your information. When you're all done you select Submit on the last wizard page. And after you've supplied all the data from the four wizard pages, we'll produce the site recovery task list (slide 19). This is a list of tasks that you need to go through to recover your site.
It lists four different steps or categories of steps. You have preparation steps. This is where you gather the software required, like make sure you have your SMS CD, make sure you have your SQL Server CD, make sure you have your Windows 2000 CD. Have all the documentation necessary: so your admin guide, your installation guide for your different applications, and have whatever tools that you might need.
Then it lists the rebuild steps. Here are the steps you have to go through to install your OS, if it's required: install your new SQL Server and install SMS, depending upon what you told it failed. It lists all the rebuild steps. Then it goes to the restore steps. Here are the steps where you go through and restore your SMS directory structure: You restore your SQL Server database, you restore the registry, you reset the permissions on the data that you've restored.
Then it goes through the repair steps. The repair phase is actually the biggest phase in the process. It may have a lot of different steps in there, depending upon what failed. This is where you go through and synchronize your site with the rest of the hierarchy. You increment your serial numbers or your transaction IDs. You install any QFEs or hotfixes that were installed after the last backup or after you reinstalled; it needs to be re-created.
You re-create the manual or optional accounts; you go through and re-create any custom data that you've modified earlier that we talked about. So it goes through and tells you here are all the steps you have to go through to get your site back, basically the way it was before it failed. There's an option at the top of the page to print the task list. You want to take this task list and print it out so you have hard copy. Then there's a check box to say that you have completed that task and you know where you are.
This will indicate the steps that you can use for the Site Repair Wizard. If you remember, one of the options on the screen I mentioned was, "are you going to use the Site Repair Wizard?" If so, it will list which tasks can be completed or will be completed by the Site Repair Wizard. Actually, an awful lot of those tasks can be completed automatically by the Site Repair Wizard in SMS 2003.
The next slide (slide 20) you see is a recovery task list. You don't see the entire thing, because it's too long for one page, but I have the top portion of the site recovery task list. At the top it tells you some basic information. It tells you that the first thing you do is print out the information, so click the Print button. Then it shows you some steps for preparation and gathering some information; here is part of your rebuild steps. For example, in the rebuild section, if you had told the wizard that you were going to use the Site Repair Wizard, then those two steps, Delete Remote CAPs and Save the Srvacct Folder would be selected under the Performed by Wizard column, because the wizard can perform those steps for you.
Then it will show you the restore phase, and then the repair phase. Again, the repair phase is generally going to be the biggest phase. It's going to go through and produce this list of tasks that you have to go through for your recovery. Generally these are manual steps. However, with SMS 2003 you can have the SMS environment, in other words the SMS Site Repair Wizard, automate a lot of those manual processes for you. After you get to the restore and repair phases, which are generally considered the hardest of the phases, you can have SMS do the work for you. And that's through the Site Repair Wizard.
The Site Repair Wizard is included with SMS 2003 (slide 21). We haven't yet released a version for SMS 2.0. We've talked about it for a long time, and for various reasons it just hasn't been released yet. We're still looking at a way to release it and make it available to you all, but for SMS 2.0, unfortunately it's a manual processes.
For SMS 2003, however, we have a wizard that we like, and it's working. So it's included with the beta. You can try it out today. It actually does support both SMS 2003 and restoring and repairing of SMS 2.0 child sites. You can use it on both an SMS 2.0 site as well as an SMS 2003 site. It's installed automatically. You don't have to go through any additional installation; there are no optional components installed, like the Site Recovery Expert. This is included with the admin console as well as in your SMS program group.
You just select the site in the admin console and go to the Action menu; point to All Tasks, and then click Repair, and it will kick off the repair wizard. I'll show you some screen shots of it coming up. Many of the following slides are screen shots of the wizard.
It can automatically restore your SMS backup. The SQL database, the file structure, as well as the registry that you backed up through the automated tasks, it can automatically restore those if you want it to. Or you can manually restore those and just tell the wizard that you did it manually. Then it will kick off with everything after that.
It performs an automatic repair of the site hierarchy. The vast majority of the repair steps you see from the expert can be done by the wizard. It will automatically resynchronize your serial numbers or reset your transaction IDs so you don't have data corruption. It can retrieve object data from child sites. So let's say you backed up a week ago and then today you created a new package that was replicated down to your child site. And then later on today your site fails. Your backup obviously doesn't have that package in it. But the Site Repair Wizard can go down to that child site, find out about that package, and re-create that package for you at the site that you're recovering. It's a very, very cool way of recovering packages, programs, collections, and advertisements.
It can also retrieve site configuration data from the parent site. As a child site, any time you modify your site settings, you initiate a transfer of your site control file up to your parent site. The Site Repair Wizard can go up to your parent site, if you want it to, and pull down the last site control file the parent has for you so it can reset your site configuration from your parent site.
Here's a screen shot of the opening page of the SMS 2003 Site Repair Wizard (slide 22). It basically just welcomes you and it asks for the site server to be repaired. This is important in that you can reference a remote site server. You don't have to run this on the site server you want to repair. So if you're sitting at your central site and you're the SMS administrator for your hierarchy, you can remotely perform the site repair of your sites by using this wizard and just punching in the site server computer name of the child site that you want to work with.
After you've started the wizard, the first thing it does is it prompts you for the server to be repaired (slide 23), which we just saw on that initial page, and then it's going to tell you what rights are required to run the wizard successfully. It's going to tell you that you need to have local admin rights; you have to have modify rights to the site's object; you have to have read rights to the parent site and child sites that you want to use for reference sites, and you need to have change permissions on the SMS shares and file structure. It's going to tell you what mandatory steps you need to have done on your own, which basically is to stop and disable your SMS services. And you could manually restore the site if you want to.
It's going to query and ask or prompt you, do you want to have a manual restoration? In other words, did you manually restore your SMS site, or do you want to have your Repair Wizard automatically restore the site? You can choose either option. It's going to ask you, when was your last backup? What was the date of your last backup? And when did your site fail? That is used for calculating the number of days your site was down, and possibly using that to know what kind of padding values to implement for packages, collections, and advertisements. We'll look at that later on.
It's going to ask you for your site hierarchy information. Are you a parent site? Are you a child site? Are you the central site? What sites do you want to use as reference sites for pulling off any orphaned data? It's going to ask you a lot of different information about your site hierarchy. It's going to check your parent site for connectivity and see if it can pull off the site control file. It's going to check your reference sites for packages, collections, and advertisements that have been orphaned. It's going to allow you to change your padding values for your object IDs.
By default it's going to show you the number of objects that it found at your child reference sites and put those values in. But if you want, you can increment those, saying, "Well, I might have had some other objects. I'm not positive. Let's bump it up by a few more values to make sure that we're clean," and you can specify what values you want to use on that.
Here's the screen shot of a couple things I just referenced (slide 24). The left shot of the wizard is Site Restore Steps. It shows you again what you have to do manually. The first thing it says is Disable SMS Services, and it will do that for you automatically. The last one is, "Have you done your restoration?" Again, it can do that automatically for you. In fact, if you look at the bottom of that same page, it says The site has been manually restored, or has no backup, or you can supply it a path to go find your SMS backup and it can do the restore from that.
Then the next page that you see (on the right) is the Site Backup and Failure Dates. It's asking you what is the date of the last backup that you had, and what is the date of your site failure? Again, it uses that for object-padding values. If it can't recover any data and doesn't know how much you might have created, it will use some values there. It will take the number of days your site was down and multiply that by either 5 or 10 for collections, packages, and advertisements to increment a padding value for you.
The next thing that would happen is it would ask you about your site hierarchy (slide 25). It will show you your current site hierarchy, according to what it has in the database. If it doesn't have anything, you can go in there and add child sites or remove child sites that have been removed since your last backup, as well as addresses for those sites. You can specify what your site hierarchy looks like.
Then you specify which sites you want to use as reference sites, and this is for primary sites. You can specify what primary sites you want to use as reference sites. Reference sites are used by the wizard to find any packages, collections, and advertisements that were created by the site that you recovered and that are replicated down to your child site (which happens automatically) and that aren't in the database. In other words, what packages, collections, and advertisements can the wizard automatically recover for you from your child sites? You list any child primary sites you have that you want to use as reference sites, and it will go out there and access those reference sites.
After you specify that information, then you can go to the repair phase (slide 26). The repair phase is where we go to remove data from CAPs. We're going to remove all of your inbox data on your CAPs just to make sure that those CAPs don't have obsolete data. We'll push the new data back out. It can retrieve your site control file from your parent so that you configure your site the same way that you had last reported up to your parent site. It rebuilds your child site addresses from the site control file. It can delete orphaned parent site data, so any data at the parent site that is no longer valid at the local site that's been orphaned.
It can update your transaction IDs and serial numbers. This is a key portion of the wizard and a biggie as far as things that it can do for you. It can automatically set those values so that you don't have corruption or loss of data in your hierarchy. It can regenerate lost software object definitions, meaning it can re-create packages, collections, and advertisements from your child sites that were not contained in the last backup that you created, before the site failed. It's a very, very good way of getting data back without having to re-create it as an admin.
This helps prevent orphaned data at your child sites. Otherwise the child would have data that's there — packages, collections, and advertisements — that's locked and that they couldn't manage at all. There are procedures for clearing that lock for orphaned data and getting rid of it, but the wizard will help recover it if it's necessary and if it can be done.
Here's another set of screen shots (slide 27). The left screen shot says, "I went out there to that second child site that you were using as a reference site — in this case TP2. I went out to it and I successfully recovered some data from one reference site, because that's all I have listed." On the next wizard page you see it says Objects Created After Backup. It went out to my child site. It found that I had created one collection, one package, and one advertisement that it recovered from my child site. It's assuming that it needed to increment each of these values by a value of 1, because that's all it found at the reference sites, that I had created one of these. So if I increment the current values in the NextIds table in SQL Server by a value of 1, everybody will be happy.
If you're not sure about that, you can click the Reset button and enter whatever values you want. Then you can say I want to add 10 to each of those, or whatever the case may be. You can place those values in there. If you didn't have any reference sites available, or if all your child sites are secondary sites, and you can't use secondary sites as reference sites because there's no SQL database there, then for collections, packages, and advertisements, it would take the number of days your site was down — remember that wizard page earlier when it had the last backup date and the date of site failure — it would multiply either 5 or 10 for each of the days it was down, and put that value in for collections, packages, and advertisements.
For collections it multiplies the number of days you were down by 5, and packages and advertisements are multiplied by 10. Then it would enter those values for you, or you could enter whatever values you want, that you deem necessary.
After you allowed it to do the recovery, then you're at the final steps of the process, and in your completion section (slide 28) it will verify your package source as an option. You can have it go out there and verify your package source files, as well as force a refresh of your packages out to your distribution points. So you can do that.
Then it will automatically restart your SMS services. It stopped a bunch of services for you, and you can have it restart your services. Then it will list tasks that you need to go through and manually verify. You can manually verify that all your child sites are reporting properly. We don't force child sites to send up their current site control files. That will happen over the next 24 hours, because every 24 hours child sites send site control files up to the parent. You look for any missing data, orphaned data at your child sites, or you may say, "I thought I created this package and it's not there anymore," and you need to re-create it.
You can verify your package deployments. You're looking at the packages on your distribution points and verifying they are set correctly. Configure any site settings that were missing from your site control file from the parent site that you had changed, but that the site control file hadn't replicated up to your parent before the site failed. Or any thing where you just decided, "Okay, because we have to reinstall anyway, let's change our inventory to weekly instead of daily." So is there anything you want to change on your own?
The last wizard page we show is the wizard completion page (slide 29). Here's where we see what the Site Repair Wizard did for you. You see at the top it tells you Repair progress. In this case, it's all done. I didn't scroll through, but if I were to scroll through, what you would see is that it's deleting outdated files from my CAP. So it shows you that. If I had scrolled down, it would show me that it recovered one package, and however many programs from the package it recovered. It would show me how many advertisements it recovered, how many collections it recovered. It would show me that directly.
Then at the bottom my final instructions it shows me things I need to do on my own. Those, again, are things that we just mentioned: verify your sites are communicating properly, verify your addresses, and configure your site settings. If you're not sure, you always have the Show Steps option. If you click Show Steps, it will bring up a list of instructions for you on what you need to do to accomplish these tasks.
That's our presentation. Let's do a quick summary and then we'll let Otto kick off our Q&A section. As a summary, backup and recovery is easier in SMS 2003 than it was in SMS 2.0. In SMS 2.0 we didn't really provide a lot of tools for you to do a recovery. We had the Site Recovery Expert that you'd go up and visit on the Internet, and that helped out a lot. But there wasn't any automation of recovery processes.
We also didn't support secondary sites in SMS 2.0. All those things now are supported in SMS 2003.
Recovery is required if you reuse a site code or a site server in a hierarchy. The big thing about the recovery, again, is making sure that your serial numbers, transaction IDs, and so on are resynchronized during the recovery. Otherwise you may experience data loss or data corruption, and that's not what you want. So you want to make sure you resynchronize your values.
You want to make sure you test your backup and recovery plan. It's essential that you do that. You want to make sure that you do know how to perform a recovery, in the event that it is required, and that you can efficiently do it so your site is down as little as possible. The SMS Site Recovery Expert provides details and procedures for site recovery. You give it the information about your environment. We then tell you what you need to do to properly rebuild, restore, and repair your computer and your site.
You can use the Site Repair Wizard to help automate the repair and restore processes. So it's a great tool to help take care of a lot of the steps that the Site Recovery Experts are going to tell you need to be done. In fact, again, if you tell it that you are going to use the Site Repair Wizard, it will tell you what steps can be performed by the Site Repair Wizard, and you'll see that there are a lot of them. It's a great, great utility.
These are available in SMS 2003 today. If you're in the beta program, you may very well want to start testing the Site Recovery Expert and testing the Site Repair Wizard and see how they function for you. Obviously we're in beta phase, so there are going to be issues with those. If you find those issues, feel free to send those in so we can fix those for you. At that, I'll kick it over to Otto and we'll have Q&A shortly.
Otto Cate: Excellent. Thank you very much for the presentation. Before we jump into the Q&A today, I just have a couple of program notes I'd like to share with our audience.
The Q&A portion of the support WebCast is intended to encourage further discussion of the topic at hand. One-on-one product support issues that require some lengthy technical assistance are outside the scope of what we're able to address during the WebCast. So if you need some more complex technical assistance, please submit an incident on the Web, or contact Product Support Services directly and speak to a support professional on the phone.
Moving on to the first question: With SMS 1.2, there was a problem when a customer ghosted machines. We had multiple GUIDS and had to basically clean them up when we migrated over to SMS 2.0. Is that a risk when moving over to SMS 2003?
Wally: Absolutely. Any time you ghost computers or do any kind of imaging and you image the SMS client installation, there's a little bit in the registry, but the biggest key is the Smscfg.ini file. Any time you have that in your ghost image, or whatever utility you use to do your imaging, you are going to generate duplicate GUIDS, and you are going to cause problems, not just in SMS 1.2, but also in SMS 2.0. So if you are still ghosting your SMS 2.0 clients, you will have the same problem. SMS 2.0 does not prevent that at all.
If you have duplicate issues right now in SMS 2.0, then when you migrate to SMS 2003, you're going to have those same issues, because migration from SMS 2.0 to SMS 2003 does not clean up any duplicate GUID issues. That's something you have to take care of before you migrate to SMS 2003, otherwise you will have them there as well.
Now with that said, SMS 2003 clients are able to recover from a duplicate GUID scenario, where SMS 2.0 or SMS 1.2 clients did not. What we do now, and I mentioned this in a couple other WebCasts, is the client keeps track of three different types of data on the client computer itself: hardware ID, SMBIOS serial number, and Windows activation code. If it sees that any of those values have changed, then it assumes that the GUID has come from a different computer, because those three values changed on me, my SMBIOS serial number, my Windows activation code, or my computer SID. Whatever it is, it has changed. That means I received this GUID from somebody else.
So the client will automatically create a new GUID locally. And then when it reports its next discovery data, it reports, "Hey, here's my old GUID; here's my new GUID; here's when the date and time changed," so that you can track that stuff in the database. The clients in SMS 2003 are more intelligent and they will handle that. However, if you have the problem today, you are going to have the problem in SMS 2.0. If you don't clean it up, you are going to have that same issue in SMS 2003, and the client won't know it has a duplicate GUID because of the fact that you are not imaging on top of SMS 2003; it's pre-SMS 2003.
So the long answer is, yes, you're going to have a problem. So clean up your problem today in SMS 2.0. Make sure you don't have a problem with duplicate GUIDs when you migrate to SMS 2003.
Otto: Okay. We have a follow-up to that: Is there some kind of a script available to easily correct the issue?
Wally: There was a WebCast that we did. Actually it was one of the guys from Product Support Services (PSS), and you can find it by searching the archive. We did a WebCast on managing duplicate GUIDs. I'm pretty sure that's what the title was. So if you go to http://support.microsoft.com/webcasts/, then select Past Support Webcasts, go down to Systems Management and search there, you'll find a WebCast on managing duplicate GUIDs. It gives you queries and utilities that you can run to analyze and find your duplicate GUIDs, and then clean those up locally by sending a utility out to the clients and forcing them to create a new GUID. There is a WebCast on that, so I would look at that.
There's also, I believe, a white paper up on the SMS Web site, http://microsoft.com/smserver/. If you go up there, under Technical Resources and Administration, there is a white paper, "Managing Duplicate SMS Unique Identifiers." So look at those two resources and that will help you out.
Otto: It appears that the WebCast that you're referencing is called "Handling Duplicate Systems in SMS 2.0." Is that correct?
Wally: That sounds like it would be the one.
Otto: That was Thursday, August 17, 2000. That's on the Past Support WebCasts page, underneath the System Management section, like Wally mentioned. That should give you some good details.
Wally: It was done by product support, and the guy who did that is very, very knowledgeable in that subject. Definitely, I would go with his recommendations in there.
Otto: Excellent. Moving on to the next question here: Does the backup for SMS 2003 create the temp file during the backup, as in SMS 2.0, or has this been changed so it's basically like straight dump?
Wally: Good question. I can't give you a definite yes or no. My guess is it's doing the same thing it did in SMS 2.0, as far as that maintenance task. I'm sure the maintenance task is pretty much the same as it was before, other than the fact that again we're not backing up all the binaries. So our backup is quicker and more efficient.
My guess is it's still doing the same thing it did in SMS 2.0. However, we can mark it as a follow-up and I will verify for you whether it's a pure, straight dump or whether it still uses the temp. My guess is it's still using the temp file, but I've been wrong before.
{Follow-up answer: Yes, we do create a temp file during backup in both SMS 2.0 and SMS 2003. A request has been made to remove the temp file creation process, but timing will dictate whether or not we will implement the change.}
Otto: Next question: Does the Site Recovery Expert support moving an SMS site to a server running a newer version of the operating system? For instance, a current SMS server running on Windows NT 4.0 needs to be moved to new hardware that's running Windows 2000.
Wally: The version on the SMS Maintenance and Recovery Web site doesn't, because it hasn't been updated, but the SMS 2003 version does. However, the documentation up there, like swapping server hardware, would cover that scenario. So if you use the SMS 2.0 documentation on the SMS Maintenance and Recovery Web site, that wouldn't cover swapping server hardware, including OS upgrades. The SMS 2003 Site Recovery Expert does have that all built into it. So it does cover it, yes.
Otto: Excellent. Next question here: Under the repair steps, back in slide 19, the install QFE section, step 4, wouldn't this always need to be performed, because binaries are not being backed up in SMS 2003?
Wally: Yes. Yes, you do have to do that. So that's why we have it as a repair step, so you do need to go through the process and install your QFEs. So any hotfixes that you may have applied to your site after the backup need to be reapplied. Now again, for SMS 2003, there aren't any QFEs, so it's a moot point. But for SMS 2.0, there are QFEs, there are service packs. So after you've reinstalled your site, you have your SMS site back up and running, you have to reinstall any QFEs.
Now again, with SMS 2.0, we did back up the binaries. It's only with the SMS 2003 version that we're not backing up binaries. They would be there, but you would want to verify that the QFEs have been reapplied appropriately.
Otto: Okay. During the presentation you mentioned the name and location of a file that modifies the SMS backup. Could you repeat that?
Wally: Sure. The name of the file is Smsbkup.ctl, so SMS backup control is what it stands for, and its location is on your site server in SMS\Inboxes\Smsbkup.box. It's an ASCII text file, so just open it up with Notepad and read the syntax there. There are sections where we say, "do not modify" and there are sections where we say, "editing allowed." We give you the syntax to go in there and add services that you want us to stop, add files that you want us to back up, add other commands, or add services to restart.
That is also where you would place the file called Afterbackup.bat. If you wanted to create a file called Afterbkup.bat, I believe you'd put it in the same location. That would be the batch file that our automated task will kick off after it's done its backup. So if you had other things that you wanted us to back up, and again my example was automatically kicking off an NT Backup to take this exported directory of all the SMS backup data and dump it out to a tape, you could have a batch file to kick that off. You would put that in the same location.
Otto: Okay. Does the package source version for parent site packages get recovered and/or verified by the Site Repair Wizard?
Wally: The Site Repair Wizard does verify the package source, yes. Site Repair Wizard should go out there and verify the source version, so that you have the source version set correctly. Then it has the option of allowing you to refresh your distribution points. So yes, that does perform a package verification.
Otto: During the presentation you mentioned performing snapshot backups, covering SQL, site server registry, and file structure. If I perform a full backup of servers each night, should I need to worry about doing snapshots?
Wally: What happens if you do a full backup of your servers is you're going to get all the same data, because you're getting the registry. If your SQL is on top of the site server you're getting the SQL, and you're getting the SMS stuff. However, it's a lot more data that's being backed up, and it's in your big backup structure, so you have a lot more data there than you really need to back up, as far as SMS recovery is concerned.
Now I understand you may want it all backed up for other purposes, and that's fine, but as far as SMS recovery, you're going to have a lot more data there that you have to sift through to find what you need.
Our backup task does generate the snapshot and it backs up just those things that are relevant to the SMS environment and places it in one location, so it will be quicker to find the data when you do the restore. However, if you're doing your full backup of each server, that should give you all that data that we talked about with the snapshot, because you'll get your hive information, which is the registry, and you'll get your file structure, and you'll get the SQL data backed up as well, provided that you've done a SQL backup, so that you have an appropriate data backup. So that should get everything there; it's just more data.
The other thing that our task does for you that you wouldn't be getting with the automatic backup is that it also runs some utilities that would then dump out your computer configuration information: computer name, domain name, what drives you have, and where things are installed on those drives, so that if you have to redo your server, then you can place the appropriate software on these same drives, because SMS doesn't let you move from drive D to drive E in a recovery scenario. You have to install on the same drive. So it would dump out all that information for you as well. You may have all that information backed up, so it may not be important to you. That's just another thing to look at.
But, yes, if you're performing a full backup, then you have everything. The big thing is to make sure you do have everything there, because too many people think that SMS uses SQL; I can just do a SQL backup and then I'm fine, and that's not the case. If you're performing a full backup of your entire server every night, then you should be good to go.
Otto: If the Site Repair Wizard updates transaction numbers and serial numbers, do I still need to document my numbers manually? I guess they are trying to figure out how often they need to document their serial numbers.
Wally: The Site Repair Wizard does allow you to change those values. However, the Site Repair Wizard is only available for SMS 2003, right now. After you're in an SMS 2003 environment, then it won't be as important for you to document that information, because of the fact that the wizard can update those values for you. However, that's not available yet, except in beta form for SMS 2003, and I believe the beta version has some issues with SMS 2.0 sites right now. So there are things that have to be fixed after beta for supporting the SMS 2.0 sites, because we don't support that right now in the beta process.
However, let's take the scenario where the backup that you've created has been corrupted for whatever reason. So now you can't use that backup. It may be advantageous for you to have already documented those serial numbers so that when you run the wizard, and let's say you can't find any information at your reference sites, you can have some idea of what values you should reset those to.
I would still do it. It's not going to take very long at all. It takes a minute to launch Query Analyzer, change your database to SMS, run the query, and print it out. So it's something that's a very, very quick process.
I would do it, just as safekeeping anyway, in the event that my recovery wizard fails for some reason, or let's say there's a problem with the repair wizard, or it can't pull any data from the reference site, so it doesn't know how much to increment those values by. If you have it documented, again, as a backup of your site recovery wizard process, then you have one more chance of setting them the way they really should be set, instead of just guessing.
I would still recommend doing it. Odds are you won't ever need it, but again, it's one of those things that would to be very, very simple to do and would not take a lot of time, and in the remote chance that you need it, it may help you out, and you'll feel better about it.
Otto: Excellent. Thank you. Next question: With SMS 2003, can you back up a server that's not a member of your domain through Web access?
Wally: Not through Web access, because the backup task itself, you enable it for the site server. You would go to whatever the site is and enable that backup task for that site server. So I go to my site server for that site, I enable that task, and it's going to back up that site server itself, and do the same thing for a secondary site.
Again, in SMS 2.0, secondary sites are not supported, but in SMS 2003, secondary sites are supported. So you are configuring the task for the site and it's going to back up that site server itself, as well as the SQL Server. You can't give it a remote server to back up, nor can you go over to the Web to do so.
The Site Repair Wizard that you saw the screen shots of does allow you to specify a remote server to restore and recover, but that's not the backup that you're asking about. You're performing the backup on the site server itself. It's what you're enabling the task for, and that's what it's backing up. If I didn't understand your question, send a follow-up question.
Otto: This one might be a little outside the scope, but I'm going to throw it out just in case, because we addressed some GUID issues early on: Do you know of any possible GUID issues with RIS?
Wally: Any deployment technology that's going to deploy images, including RIS, if you have your SMS client installed in that image, and you have the Smscfg.ini file in there, it has the SMS GUID for that client. RIS has the capability to deploy not just the fresh operating system, but the operating system with applications installed, and so on. So if you had that GUID there in your RIS image, and you deployed it with RIS, then yes, you'd have the same issue.
It doesn't matter what the deployment technology is — any of those are going to have issues if the SMS 2.0 client is installed at the time that you do the image. You're going to have the Smscfg.ini file there, and it's going to cause problems.
Now if you look at the Microsoft Systems Management Server 2.0 Resource Guide, I believe on pages 324 and 325 we list some steps and procedures you can go through to install your SMS client, but then prevent GUID duplication by removing the Smscfg.ini file, as well as the location in the registry where the SMS unique ID is stored, or the GUID for that client. If you follow those procedures, then you should be able to prevent any duplicate GUIDs, including whatever your ghosting or deployment technology is.
Now, with that said, with any technology where you're ghosting or imaging an SMS client installation, unless you go through those procedures to remove the GUID, you are going to have problems, whether it's RIS, whether it's ghost, whether it's PowerQuest, or whatever your application is. So the best thing is to do your imaging before you install the SMS client, and then install the client after. That way you won't have any duplicate GUID issues. Or go through the SMS Resource Guide, I think to pages 324 and 325 (you can look it up), and that will tell you how you can clean off the GUIDs so that you don't have problems with duplicates.
Otto: Okay. This one is pretty much a general SMS 2003 question here. The user is wondering if they can do software monitoring, like Web site access per client with SMS 2003, or if it's just a straightforward management.
Wally: Our software monitoring, in both SMS 2.0 and SMS 2003, will allow you to monitor applications that are launched, but not specific Web sites or what a user would do specifically with an application they launched. So after they launched Internet Explorer, all we do is track if the process started and when the process ends.
We wouldn't track what they're doing inside their Word doc, what Web site they visited, or what values they enter in any Excel spreadsheet or anything like that. So we don't track the data that's used. With our software monitoring process we track the application start, application stop, and that's all, in either of the SMS versions.
Otto: The final question that we currently have in the queue is the ever-popular: Where can obtain the beta, or what can I do to get into that program?
Wally: The beta was announced and released on September 30, 2002. So it's been out for a few weeks now. If you're not an existing beta customer, in other words, if you haven't been nominated and accepted as a beta customer, you can nominate yourself if you want to. The easy way is just go up to the SMS public Web site. So go to http://www.microsoft.com/smserver/ and you'll see information about the SMS 2003 Beta. You can go there, and that site has a direct link to the BetaPlace Web site where you can request to be a beta tester.
Basically you can go to our Web site and you can follow the links to it, or you can go to http://www.betaplace.com/, and then you can nominate yourself as a beta tester. We give the account name and password that you want to use to nominate. The account name is SMSInvite, and the password is SMS2003, and SMS is in initial caps.
If you go to the Web site, then use that account name and password, you can request a copy of the software. We'll send you out a beta copy of the software, and then you can do your beta testing. So that's available to anybody, right now. I don't know what kind of restrictions there are, as to if there are any kind of restrictions on who can be on the beta and who can't, but you can go up there and nominate yourself anyway. Or you can have a Microsoft person nominate you. But you can nominate yourself, if you wish. Then if you're accepted, you will be sent a CD, and you can go from there.
As to it being in MSDN® or TechNet or whatever, my guess is yes, because traditionally we have placed our beta software in TechNet and MSDN subscriptions and so on, but I have not talked to anybody to see if they're doing that for this release as well. Again, we traditionally do, so I would expect they would get there eventually. Again, it was just released on September 30, 2002. So it has only been out for a couple of weeks. So it's not going to be everywhere yet.
Otto: We do have another question I want to throw out here: The advertisement feature of setting up a recurring schedule, say from 6:00 P.M. to 6:00 A.M. daily, is that still in the feature set, or has that been removed?
Wally: Somebody's user group told all the users that this was removed from the product, so I'm getting 100 e-mails saying this has to be added back into the product. I won't mention any names on who's sending e-mail. Last I heard, this has been removed from the product, yes. There has been, like I said, a lot of e-mail requesting that feature be added back in, and those requests will be sent to the appropriate powers that be, stating that this is something that people really, really want.
Just be aware that if we decide to put this in because you guys really, really want it, that may delay the release of the product. So what we're trying to do is not accept additional features that are going to cause any kind of schedule slip. We didn't drop this just because we thought nobody wanted it. We didn't drop it just to be mean. We dropped it because of the fact that this may involve additional coding that we haven't done, and because it was not part of the original feature set for SMS 2003. It's something that we accepted as a design change request or DCR not terribly long ago, but with schedules, we're looking at every single thing possible to see if this is something that is really important enough to possibly cause a schedule slip.
As of today, my understanding is it is out of the feature set. It's not in the product. Again, it was never originally in the product. It was something we added after our initial feature set was released. I will take all the e-mail that I have received and I will forward them to the appropriate powers that be, and see if that changes any minds. But I cannot guarantee anything. I'm not a person who makes any design decisions. I get to decide what WebCast we do on a monthly basis. I can do that for you, but I can't help you out with what's in the product. But I can pass the information along.
Otto: Okay. Excellent. With that, it appears that we've covered all the questions that are within the scope of the backup and recovery, and a couple others as well. So I'm going to wrap up the session today. I want to thank everyone for joining us and hope that the information was useful to you. I want to thank Wally for coming in and giving us a great presentation, as always.
If you happen to have any suggestions for future topics, comments about today's show, or comments about the WebCast program as a whole, {please e-mail them to supweb@microsoft.com}. Our goal is to ensure that we're providing you with the right content in the best way possible. Your feedback, as always, is very important to us.
I hope that everyone has the opportunity to tune in again in the near future. Thanks, everyone. Have a great day.
|