|
Do you find the Support WebCast transcripts helpful? Microsoft Support WebCast Systems Management Server 2.0 Backup and Recovery July 6, 2000 Note This document is based on the original spoken WebCast transcript. It has been edited for clarity. Barbara Lamkin: Hello, and welcome to the Microsoft® Support WebCast. We would like to thank all of you for joining us today. Our topic today will be "Systems Management Server 2.0 Backup and Recovery," and our presenter will be Rob Wickham. This is a change from what you see on your slide; Wally Mead was the original presenter, but business needs called him away. I'm Barbara Lamkin, and I'll be your host for today's session. We will start the session with Rob's presentation, and follow that up with a question and answer period when the presentation is finished. We only answer questions submitted for the Support WebCast during the live event. I would now like to take a moment to introduce Rob. After being in the military, Rob Wickham joined Microsoft PSS in 1993, supporting LAN Manager and Windows NT® 3.1. He was the technical lead on Hermes, which was our Systems Management Server version 1.0; and after that he joined the CPR Escalation team to support the SMS 1.0 and 1.1 releases. In January 1995, Rob took a position in the SMS Product group as a Program Manager, and took over the primary role of maintaining PSS relationships and product maintenance, including QFEs and service packs. Rob also administered the beta program for SMS 1.2. Currently, Rob covers several SMS 2.0 feature areas, including the resource kit, upgrade and interoperability with SMS 1.2, site recovery, and aspects of post-deployment use of the SMS 2.0 release. Thank you so much for joining us, Rob. Please begin. Rob Wickham: Thank you, Barbara. Let's look at the session agenda for today's presentation. We will be talking about backup and recovery as it relates to SMS 2.0, SP1. We will also be looking at our new tools that assist in the backup and recovery cycle. We will get into some details of that actual recovery process, and then we're going to provide a demonstration of sorts of our new tools. We've been working on those for some time. I know you've been patiently waiting for them. And we're really nearing the end of that cycle for those tools. Backup was supported in SP1. It was also supported at the original product release. However, in SP1, we opted to enhance the backup task so that it was a more flexible procedure that individual customers could tailor to the specifics of their particular site. At this time, secondary site backups would require a non-integrated type of backup, using a batch file or the NT Backup utility, and so forth. It's worth mentioning that in most cases, secondary sites can be recovered remotely, using the primary site. The primary site, being the primary data store, was a key factor in making sure that that primary site had an integrated backup. The Restore was also supported in SP1. However, at that time, it was a very complex procedure, and we needed PSS's help in cases where you needed to recover a site. Also, at that time, we didn't have publicly documented procedures available, and we've been working on that very diligently. We are also developing Web site content that you will be able to visit periodically and make sure that you have the most current procedures available for recovery. For the SP1 timeframe, we do highly recommend that PSS help with each recovery effort that you undertake. For that, we have an SP1-based backup and recovery document that PSS is able to provide to you. Some of the new tools that we have are enhancements to the existing documentation, and they have been converted into a Web-friendly format that will soon be released to the SMS MGMT Web site—the path is shown on slide 4 — when that becomes live. At this point, it is not live, however, so that link will not take you anyplace useful. One of the key things that you will find at that location is a tool we call the Site Recovery Expert. And this Expert is very unique in that you are able to walk through any number of recovery configurations that you may encounter, and the Expert knows the dependencies between a particular configuration and the necessary steps to recover that configuration. That is important because there are more than 100 individual recovery steps that are possible in the worst-case recovery cycle, but in the more common recovery scenarios, you only need a fraction of those individual steps. And the Recovery Expert is able to filter down that list to just the applicable recovery procedures that you would need. In addition to the Recovery Expert, we have a number of tools to assist in automating portions of the recovery cycle. The most interesting of those tools is the Site Recovery Wizard, which functions as a traditional wizard. It asks you some questions, and it will also do some interrogation of systems in your site, and assist you in repairing the site, and the relationships to parent and any children underneath the site. This material ought to be available any time now. SP2 was released, and we have been engaging in some specific testing to make sure that when we publish these recovery procedures and tools, that they are fully compatible with both SP1 as well as SP2 and Windows® 2000. Let's review the supported recovery scenarios, specifically, if your Site Server or your SQL Server hardware fails, or if you have a hard disk fail within Site Server or SQL Server. File system corruption is also a scenario that requires recovery. Other recovery scenarios we support are reinstalling Windows on Site Server or SQL Server, and reinstalling SQL Server, if corruption of the SQL Server database occurs. It's important to note that the terms backup and restore that we're all familiar with are slightly different with respect to distributed systems like SMS. That's why we refer to this process as backup and recovery, and the recovery takes into account the distributed nature of the system and synchronization issues that you need to observe. Now let's move on to slide 6, and talk about backup and recovery in a little more detail. There are three main phases that we need to observe, the first of which is planning. It's essential that you be prepared for the failure event. Whether or not that failure event will happen in the next five to ten years is hard to say, but at some point or other, you can count on some level of your hardware failing, and you need to walk through a planning cycle. The Recovery Expert is also very valuable for that planning cycle, because you know what your configuration is as you deploy into production, and you will be able to tell the Recovery Expert what that configuration is, and it can then give you back the cookbook of recovery for that configuration. You can then print hard copy of that and lock it away in your safe for that day when you need it. The next step is to make sure that your backup actually includes everything necessary to perform a future recovery cycle. To that end, you need to also test that recovery cycle using the backup procedure that you've put in place for your site. What we have documented rather fully in the SMS 2.0 SP1 backup and recovery document has been pulled forward to incorporate any changes that relate to SP2, and PSS has those updates and is able to give those to you in advance of the release of this material to the external Web site. Now let's dig into the planning phase a little more deeply. There are a number of items that are very important for you to actually write down and put away for safekeeping. You need to document your hierarchical structure and site codes in use within that hierarchy. Any accounts and passwords that are used by the site should be documented for future reference. This includes any additional accounts, such as client connection accounts or client installation accounts, and so forth, that you have added to the system. It's also important that you document your domain structure and any trust relationships. Finally, any customizations that you have made to the system should be documented. For example, customizations made to system login scripts should also be documented, so that when the site is restored, those customizations can be put back in place. It is essential to have a user-defined SMS client connection account. If you don't have more than just the default single connection account, it's possible that clients can become orphaned. There are other procedures to go through to recover these orphaned clients, but it is much more of a best practice to simply create one or more additional SMS client connection accounts after site installation. The other thing that's important to document are some of the IDs that the system is using. Specifically, in the site database, there is a NextIDs table. There is also a Replication Manager transaction history file. These two items are key elements of synchronizing one site to its children and its parent sites in the hierarchy, and information about why these IDs are so important will be available in the recovery documentation. Let's look at slide 8, and examine the backup phase now. The site backup task, the integrated version, will back up the Site Server's files from the file system. It will back up the Site Server's registry keys; the SQL Server data, even if the SQL Server is on a remote system; and it will back up certain Windows NT and SQL Server configuration information. It is essential that these aspects of a backup be performed as a snapshot in time of the Site Server. If you were to take a backup of the SQL Server one day, and then the next day take a backup of the rest of the site, it's very likely that there will be a lack of synchronization between certain IDs that the site wants to manipulate, and the SQL Server has data stored for, if you were to perform a restore operation with these backups that were taken at different times. After you have a backup, it is a good idea to store that backup in a safe location. You can also manually perform the same backup steps that the automated task is performing using third-party products or the NT Backup utility. It may also be advisable for you to perform backups of your distribution points. Although SMS can redeploy packages, they need to be forced at an individual level, and that will very likely consume network bandwidth while those packages are being repopulated. Let's look at the Test Recovery phase of the planning cycle, on slide 9. It's essential that you verify that your backup plan actually works. Ensure that the plan can be used to recover the site. Ensure that essential data is backed up. Provide procedures for speedy recovery. There are three basic scenarios: You rebuild either the SQL or the SMS Servers that have died; you restore the data from your backup, or install your new server hardware and restore the data from the backup; and then you perform the repair work. The repair work is the synchronization of these various IDs that are used throughout the system. That involves adjusting transaction IDs and serial numbers in the SQL database, the registry, and certain files on disk. Let's talk about the importance of recovery. We've defined an SMS site recovery to be anytime an SMS site is installed using the same site code or site server name, which was previously used in the site hierarchy, then for that hierarchy, SMS site recovery has begun. Repairing the data is the core task of a site recovery. It's required to prevent interruption of the SMS operations, and corruption of data in the SMS hierarchy. Next, we need to make sure that site recovery can be verified. To do this, we want to check for failed site recoveries prior to connecting to a parent. We don't want to attach a site to a parent if that site is not verified to be running in a healthy state. Because that parent site may be reporting in to other sites higher above it in the hierarchy, the data that may be invalid on the site that you've just performed a recovery on will be echoed all the way up the hierarchy. This verification can also help prevent the corruption of software distribution objects in the new site and at child sites of the new site. Let's move on to slide 12, and talk a little bit more about the Site Recovery Expert. A Key feature is its usability, which is its interactive interview about your site and its configuration. That interview uses browser persistence features to store that configuration so that when you return, that configuration will be the default configuration from your last visit. This data is stored on your local machine. This makes it very fast for you to come back and quickly get the most recent set of procedures for the most recent configuration that you've described. It will then generate a list of tasks required for the various phases of the recovery cycle: the rebuild, the restore, and the repair, as well as the verification steps. These tasks can be printed. They contain links to individual procedures where you use the browser features for printing sub-links to get a large document that contains all of the high-level steps as well as the individual procedures for each step. When this has been released, it will be accessible from the Web link that you see on this slide: http://www.microsoft.com/smsmgmt/techdetails/recovery/. Let's look at slide 13, and take a little tour of the upcoming Maintenance and Recovery home page on microsoft.com. What we plan on providing for you is a list of the releases and the date that those releases took place. This should be an easy reference point for you to know when new content is available or existing content has been modified. We'll also be presenting a list of all the supported scenarios that we have for performing recovery cycles, and as we conduct further testing and have more customer feedback about scenarios, we'll be able to enhance the support and we will update that in this document. Other documentation we have is an executive level "Backup and Recovery Overview" that talks about the concepts, principles, and strategies for recovering an SMS site. We also have a "Backup and Recovery FAQ" document. And we have a more detailed "Recovery Planning" document that includes the high-level views of the recovery overview, as well as some details about the specific items that you should take into account when you are putting together your backup plan. Some of those details include a detailed list of the specific items that you should make hard-copy reference of and store in a safe place, for example. We also have a document that describes how you can swap out old hardware with new hardware under an SMS Site Server. We also have a very informative article that talks about SMS security as it relates to Windows NT, the accounts that SMS uses to perform its management tasks within your enterprise. I highly recommend that you download "SMS Security Essentials." I believe that that has been included on the SP2 CD and should also be available on the SMS MGMT Web site, through the Tech Details link. Although the Maintenance and Recovery page has not been released yet, you can still locate the "SMS Security Essentials" document on the external Web. And we also will present some tools here on this slide: the Recovery Expert, which I've already described; and some specific recovery tools that execute on your local machine, for the purpose of resetting access control lists, as well as causing certain items to start a synchronization pass, and so forth. When walking through the Recovery Expert, there is some required information that you will need to know about: information about the site; if it's a primary site or a secondary site; where the site database is located; if you have site systems within the site. These are the individual pages that you will walk through. There is another page on which it will ask you for more site information. It will ask you what kind of changes you've made since the backup, and it will also ask you about any interoperability issues that may exist. Let's take a look at some Recovery Expert screen shots, on slide 15. This is the first page of the Site Recovery Expert. It is asking you for some key information about the operating system version that was running on the site when it failed; the type of site system you need to recover, because it's possible that just your database server crashed; what SMS version was running; whether it was a primary or secondary site; as well as whether there is need to reinstall the operating system. Subsequent pages of the Recovery Expert will ask you for more detailed information about the particular components that were installed, client installation methods that may have been turned on, discovery methods and client agents which may have been turned on, which data summarizers you had been using, and if software metering was in use. After you answer this question, the Recovery Expert will give you back a detailed list of the manual recovery steps. This will include preparation steps, steps to rebuild, restore, and repair the site. Let's take a look on slide 18, at the Site Recovery Task List that the Expert gives you back. In the left-hand column of the task list, notice there is a little check box that you can use to indicate if you've completed that particular procedure or not. This is a persistent column that is stored through your local browser, using the Persistence features. You can also print a hard copy of this using the icon that's shown on this slide. Let's move on to the next slide, and talk a little bit about the Site Recovery Wizard. The Site Recovery Wizard will be available shortly after the existing Web content that I've just described to you is made available. Because recovery is a very complex procedure that touches many aspects of the product, it's very difficult for us to produce a completely packaged recovery solution in a single release. We started off publishing a Manual Procedures document. Our next step will be the Web content that makes the list of procedures a more manageable thing to produce. And then we will be doing follow-up releases with additional tools, and adding features to existing tools to automate more and more of the recovery process. The Site Recovery Wizard is essentially the embodiment of recovery automation, and as time goes by, we will be enhancing the Recovery Wizard so that eventually, it will also be able to help you automate the restore process, where you can tell it where your backups are, and it should be able to perform nearly all of the recovery procedures for you, with the exception of the verification. The Site Recovery Wizard is intended to be run after you go through the Recovery Expert and you determine the complete list of recovery steps that you need to perform. The Recovery Wizard will then be able to perform a subset of those procedures for you to speed the process. The Recovery Wizard can also be restarted. You can run it as many times as necessary, either from the very beginning or from the most recently successful operation that the Recovery Wizard performed for you. It will support SP1 or SP2, Windows NT 4.0, any service pack, as well as Windows 2000. The Site Recovery Wizard does, however, require an updated SMS site provider that is based on a QFE fix of the SP1 release. That provider is an integrated part of the SP2 release, however. Let's now look at some screen shots for the Recovery Wizard, on slide 20. The Welcome page of the Recovery Wizard will be able to determine if the Recovery Wizard has been run previously on the site and if the recovery cycle that was previously attempted had completed or not. This particular screen shot indicates that the Recovery Wizard has not been run before on this site. The Recovery Wizard essentially has two phases. It has the Wizard phase, which is an interactive question and answer period, and then it goes into a repair phase, in which you are able to monitor all the transactions that it is performing as it repairs your site. Your Recovery Wizard will also verify that the user who is running the Wizard has sufficient security rights to perform a site recovery. It will also tell you who the current user logged on to the system is. These user rights need to access any parent site above the recovering site and any child sites beneath the recovery site. It also needs to have sufficient access to all the backup files, the registry, files in the file system, and the SQL database. This individual should essentially be running with privileges similar to the site service account. The next page of the Recovery Wizard, on slide 22, describes these requirements of security in more detail. It's important to note that in this version of the Recovery Wizard, any security access that the Wizard will require when communicating with another computer on the network, such as a parent site or a child site, will be performed in the security context of the logged-on user. If any dial-up links are necessary to talk to a remote site, it may be necessary for you to bring those dial-up links online for the Recovery Wizard to communicate with those sites. Let's look a little bit deeper into the Wizard phase of the Recovery Wizard. First of all, it's going to inform the user of the mandatory steps that he needs to perform before the Wizard can proceed. The SMS site services need to be stopped and set to a disabled state for the site. Our most recent version of the Recovery Wizard will be able to stop and disable these services for you, to save time. It will query the user to determine the last date and time of backup, as well as the date and time of the failure. The reason for that is that it's possible that the system has touched certain data items after backup and before failure, and the Recovery Wizard will attempt to repair these items based on the amount of time that has elapsed between backup and failure. There is also some opportunity for you to tell the Recovery Wizard about your site hierarchy. This is important because it's possible that you have attached your site or other children have attached to the recovering site after the backup was taken. This means that after the backup has been restored, the information about the parent site and any child sites would be lost. By entering this information into the Recovery Wizard, it enables the Recovery Wizard to essentially rebuild that same information as it is repairing the site. The Wizard phase will perform connectivity checks to the parent site and any children that you define, for gathering orphan data. One of the key features of the Recovery Wizard is the ability to use a child primary site and rebuild packages, collections, and advertisements that were created after the backup was taken on the recovering site. These objects are then reconstructed from a child reference site and restored onto the recovering site. The Recovery Wizard is also able to repair the synchronization and serial numbers used within SMS objects, such as Packages, and if it is unable to determine a reference site, it will attempt to compute padding values based on the timeframe and the number of objects that you've created. This is why it's important that, periodically, you make note of the IDs in the NextIDs table of the SQL Server. This will help you know roughly how many objects have been created since the backup was taken. This next screen shot of the Recovery Wizard, on slide 24, describes the preparation steps that the Wizard will require you to perform before it can continue. This involves disabling the services, restoring from backup, restoring directories and files, and registry settings. If you have completed those steps, the Wizard will be able to continue. This next screen shot of the Recovery Wizard talks about the object padding values. These are very important values to be derived from a reference site if you have a primary child site beneath the recovering site. These values can be determined by looking at this reference site, which eliminates the guesswork of knowing how many objects have been created since backup. For that reason, we recommend that you maintain a primary child site, if possible, so that you will have a valid reference site to recover this data from during a recovery cycle. This site does not necessarily need to have any kind of substantial hardware. It wouldn't have any clients reporting to it. It would essentially be a slave site, just for some specific recovery configuration data. Let's move on to slide 26, and look at the repair phase that the wizard will enter after you have answered all of the questions that it asks. It's important that the client access point (CAP) and logon point data files be consistent with the restored site and the synchronized site. The Recovery Wizard is able to go to each of the client access points and logon points within the site and handle that task for you. If you have dozens or even hundreds of these types of site systems, the Recovery Wizard is a very important automation tool. In the event that the site control settings for the recovering site are out of date, the Recovery Wizard can obtain that data from the parent site, pull it down, and restore it into the recovering site. It's very likely that any time you make a change to the site control settings on the recovering site, that those settings will have been safely sent to the parent prior to the site failing. This means that the parent site will nearly always have the most current information about the configuration of the local site. Another thing that the Recovery Wizard will handle during the repair phase is changing the software metering settings to not enforce license limits. This is important to prevent denial of service to the clients, because after a site recovery, there is a grace period where the licenses need to be rebalanced. While those licenses are being rebalanced, you don't want to have the enforcement set on. Another thing that it will do is it will set the status summary data to be deleted. This is important because after you've restored from a backup, you may have status messages and summary data inside that backup that is now quite out of date, and you don't want that summary data to be reported up the hierarchy to cause status thresholds for out-of-date data, such as heartbeats, and so forth, to set off alerts. This next screen shot, on slide 27, is the last screen shot of the Wizard phase of the Site Recovery Wizard. After you click Finish, it will enter the Repair phase and present another informative dialog. Let's talk a little bit about the Repair phase within site hierarchies. The Wizard will rebuild child addresses as necessary, based on any changes that you have made during the Wizard phase. You were given the opportunity to essentially declare any child sites or addresses to those sites that were not a part of the backup. You don't need to detach and reattach the recovering site. The Recovery Wizard handles that for you by synchronizing the local site with the parent site. The Recovery Wizard will also take care of the transaction history files on the parent, so that the parent will be able to send any related data down to the child site that it needs. It will also go through and ensure that the data between the parent site and the recovering site is consistent. Transaction IDs and serial numbers will also be updated so that there is synchronization between the parent, the repaired site, and any child sites. As I mentioned earlier, the Repair phase will also be able to regenerate any lost packages, collections, and advertisements that were created on the recovering site, replicate it to any child sites, but not reflect it in the backup for the recovering site. Also, during the Repair phase, you are given the option to force a package refresh of a distribution point, if there is a distribution point declared for the local machine that the site is running on. The Wizard will check each package source to make sure that it is non-empty. The reason why that's important is because it's possible that a particular package of the local distribution point was not included in the backup itself, and if that particular package source is to be refreshed on a recurring schedule, after you restore the backup, it's possible that you could be refreshing an empty package source, which would end up removing any files for that same package on distribution points that have been defined for that package. Next, the Wizard will restart all the site services, and it will list the verification tasks that you need to perform. The Recovery Expert provides a more detailed description of those verification tasks in a cookbook format. This next screen shot, on slide 30, describes the conclusion of the Repair phase of the Recovery Wizard. All of the detailed steps of repair that the Wizard performs are logged to a log file for you to refer to later, and there is a button that will pull up a Help file that describes the verification steps in more detail. The documentation links that will be used by the Recovery Wizard will be references into the Recovery content that will be Web-based, along with the Recovery Expert, which I described previously. Now, we've been through a lot of material. Let's review it briefly. In general, SMS does a good job of backup. In SP1, it has a very flexible backup task that can be customized to accommodate almost any configuration. Unfortunately, recovery is a complex procedure that, in some cases, even with all of the tools and documentation that we provide, may also require PSS assistance. One of the key elements of enabling you to perform a recovery without PSS assistance is the Recovery Expert. That content needs to be very accurate, which is one of the reasons why its release has been delayed recently. We are in the final stretch, so to speak, but we need that information to be absolutely accurate. You can also contact PSS and your account team, to get all of the Recovery Expert and other Web-based content that we are working on, from them. And finally, the Site Recovery Wizard, provides an automated reinsertion of a site into a hierarchy, but requires some manual recovery procedures to be performed as well. We have just a couple of other recovery tools that will be available with the Recovery Expert to help with some automation issues, but the primary recovery automation tool is the Recovery Wizard, which will be available sometime shortly after we release the Recovery home page. That concludes the formal portion of the presentation on backup and recovery. I'd like to turn it back over to Barbara, for Q&A. Barbara: Thank you so much for that presentation, Rob. I do have a couple of quick notes before we move into the Q&A session of the broadcast. If some of the details on the PowerPoint® slides were difficult to view in your browser, or if you would simply like to have a copy of the slides, be sure you download the file from the Web site. Also, if you joined the broadcast late or would like to review the content again, we will have the on-demand streaming media available within about eight hours of the live session. We will also have a full transcript available within three weeks of the live session. All of this content will be available from the Past Support WebCasts page. The Q&A portion of the Support WebCast is intended to encourage further discussion of the Support WebCast topic. One-on-one product support issues are outside the scope of the Support WebCast. If you do need technical assistance, please submit an incident on the Web, or call Microsoft Product Support Services and speak to a Support Professional. It has been an active session today, and we do have quite a few questions. And the first one is: Does SMS 2.0 SP2 include the RC Security patch? And it goes on to ask: Does it include the remote agent permissions vulnerability patch, MS00-012? There is also reference to a Knowledge Base article. And if it's not included, will we have to apply the patch on SMS Clients even if we have SP2? Rob: The answer to that question is yes. Those changes are included in Service Pack 2. If you have Service Pack 2, you will not need to manually apply those QFE fixes. Barbara: Excellent answer. The next question is: You mentioned a link earlier. When will that link be released to the public? Rob: We will be taking an incremental approach to releasing this content, beginning with the overview and the frequently asked questions document, followed shortly by the planning and procedures document. We will then follow that up with the actual Recovery Expert, and finally the Recovery Wizard. We are hoping to release the Recovery Wizard approximately five weeks after the Recovery Expert gets posted. We plan on doing the overview and the frequently asked questions content within the next two weeks, and we then plan to follow up with the planning document and the Recovery Expert within two weeks of that date. Barbara: Excellent. We have a question about the backup and recovery document that you mentioned. The actual question was: Can this be sent to us proactively as a participant in this WebCast? And unfortunately, the answer is no. I don't have that capability from this interface. However, what we do quite often do is either post the document or link to it on the same page that you used to get the WebCast today. We move it from the Future page to the Past WebCast page later today, but it will be available shortly after the session. Do you have any other information to add on that, Rob? Rob: The question relates to the recovery procedures document? Barbara: Correct. Rob: If you contact your account team, Technical Account Manager, and so forth, they have access to the internal staging server that we are using to store all this content while we go through our final edits and technical review. And they are able to send you that content from that staging location. Barbara: Okay. And because this is a public broadcast, not all people will have Technical Account Managers. They may just be logging in to the Support site. Would that still be available to them, or would we have to wait until it was completely edited and posted on the Web sites? Rob: Even customers who do not have a Premier Support Agreement should still have an account team representative in their sales district that they can communicate with and obtain this information through that avenue. Barbara: Good answer. Thank you. Our next question is regarding slide number 7, on best practices. You created additional client connection accounts as a best practice. Is there a place to find other best practices, for example, the TechNet site or some other place? Rob: That's a very good question. We actually have a series of information available that has, within the last two weeks, been posted to the SMS MGMT Web site in the form of some online presentations that you can watch, that include documentation and PowerPoint presentations that talk about SP2 best practices, changes to the product that have been introduced in SP2, and there are best practice items described there as well. Barbara: All right. Our next question is also about the backup and recovery documents for SP2. Do you have a possible date that those might be available to the public? Rob: Yes. We've covered that for an earlier question. We're looking at an incremental release of the materials that should start within two weeks, and we should see some incremental updates on roughly two-week intervals after that, starting with the home page, the overview slide, and the frequently asked questions. Barbara: We had one more question about when things will be live, and this one was: When will the Web site be live? Do we have any proposed date on that yet? Rob: This is the third question that essentially has the same answer. We expect to post our first Recovery home page content within two weeks, and we'll be following that release each couple of weeks with additional documentation. Barbara: Let's go to the next question: Why does the Recovery Expert reside on a Microsoft Web site? Rob: Because recovery involves so many different configurations and so many different failure scenarios, we recognized that we would need to engage additional testing, even beyond the first release of the configuration and solutions that we provide. For that reason, we know that we will be doing updates to this documentation relatively often. We wanted to make sure that we could do those, and get the updated information available to you within a matter of days, if possible, and hosting the information on the Web site was a key part of that rapid response capability. The Recovery Expert itself is using Active Server pages and XML as a means to produce the task list, and it was a very natural thing for it to be hosted on the Web. We are going to investigate a method to take the recovery content on the Web site and package it up, so that it will be possible for you to take this information and run it on a local system in the event that you or someone that you know does not actually have Internet access. We understand this is important for customers that operate in very secure environments and are not able to provide Internet access to their employees. Barbara: Excellent answer. The next question is: Why do we need to create a child site for recovery purposes? Rob: Okay. I could possibly have been clearer than I was during the presentation of the slides, but in general, there are some specific features that the Recovery Wizard provides in terms of rebuilding lost data objects by using what we call a reference site. In most hierarchies, we do believe that there will be primary sites, which are children of other sites, as well as secondary sites that are children of other sites. And the idea behind maintaining a primary child underneath a given site is that if that given site needs to go through a recovery cycle, we will have a reference site available to rebuild the data. If you have a hierarchy that includes a single primary site and several secondary sites, and that's the extent of the hierarchy, we are only suggesting that a primary site be installed as a child of the existing primary site at a peer level with the other secondary sites, simply for the purpose of enabling lost object regeneration on the recovering site. That statement does not require that every level of the hierarchy possess a dedicated reference site. The reference site role can be concurrently filled by any primary site below the site that you are trying to recover. So it is not a general statement that you will have to have extra hardware across the board. It is an isolated case of needing to make sure that the lowest-level site in your hierarchy that is a primary site, that could possibly fail, would have a primary child for recovery purposes. And we are not requiring it; we are only recommending it. Barbara: Okay. That was a very good answer. And we have had quite a few questions submitted in the last few minutes. And our next question is about the Recovery Wizard. Someone says: I got into the WebCast late. Where is the Recovery Wizard located? Is it in SP2? Rob: The Recovery Wizard is one of several recovery tools. It is the recovery tool that will be released roughly five weeks after the Recovery Expert is released to the external Web. At this point, we are working through the release procedures and technical review of the recovery content with the same attitude that we took with Service Pack 2. That attitude is, "Quality is the criteria," and for that reason, we're really not able to communicate specific dates for when we will start publishing these tools and solutions to the external Web. When the quality is right, we'll publish those. Until then, we are working with several customers who are doing evaluations of the Recovery Expert, the Recovery Wizards, the rest of the recovery tools, and the rest of the recovery documentation. And the feedback that we are getting from these customers is that we are very close to being done, and we just need to ask for a bit more patience. As I mentioned earlier, we will start incrementally releasing recovery content to the external Web starting about two weeks from now. Barbara: Okay, excellent answer. Quality is always first and foremost. And you actually answered most of the next question, asking any timeframe for the tools to be available. The addition to that is: Will the tools be available to PSS users before the public? Rob: PSS has been partnering with us very closely throughout the whole cycle of developing the recovery materials and tools. PSS does have access to all of this information, and if you find yourself in a situation where you need to recover a site, PSS has the option of using any of these recovery tools to help you in that effort. Barbara: All right. Moving on to the next question, which is on server name changes: Is it possible to update a server and change the name during recovery? Rob: At this point in time, the level of support for this type of thing is the same as it was with the previous versions of the product; specifically, that changing the domain name of the domain that the site is installed in, changing the server name of the machine that the site is installed on, and changing the site code of the site, are not supported. And they are such key elements of the site's configuration that we prefer that customers simply install a new site if they need to change these kinds of configuration items. Barbara: Very good information. And now let's move on to the next question, and it asks: Should we use SQL Server to back up the SMS database, or use the SMS Backup Task? Rob: We prefer that you use the SMS Backup Task, because we have more thoroughly tested that. We have not been in a position to test third-party backup procedures as well as the backup capabilities that are provided by SQL Server natively. Essentially, any type of backup mechanism is considered safe and supportable as long as it's capable of providing a backup that's based on a snapshot in time. So if the SQL Server backup task will be able to stop the site services from running and take a backup of the database, as well as all the other items that the SP1-based backup task backs up, then you should be safe to use any other backup mechanism and not be tied to the SMS integrated backup task. Barbara: And again, it looks like you may have already answered the next question, and that was: What backup software is recommended by Microsoft for backing up and restoring Windows NT or SMS? And I know you just said that we don't test third-party products and, therefore, don't make recommendations. Additional information? Rob: Nothing beyond the earlier answer. If we're not covering one of these questions that we think is a duplicate, I'd like to suggest that you submit another question, and we'll make sure that we get it just right. Barbara: Excellent. Moving on to the next question, it's about client connection account: Is it sufficient to simply document the password for this account, or is there more to it than that? Rob: Well, the client connection account is something to which you should add an account above and beyond the accounts that the system creates at install time. And the account name and password are generally all you need to document in reference to the earlier statements of the recovery cycle. Barbara: And we have a question now about the Recovery reference site: You mentioned a child slave site that will be used as a reference site for site recovery. However, during some steps of the repair process, it seems that necessary data is pulled from the parent site. Therefore, if you are repairing a central site, where is the data obtained from? Also, given this dependency, can you clarify whether the reference site should be a top-level parent or a child site anywhere within the hierarchy? Rob: A reference site can be any primary child site that exists in the hierarchy and reports beneath the recovering site. The reason why that is stated that way is because, when you create a package, for example, on the recovering site, that package definition is automatically replicated to every child site that reports in to that creating site, whether it be a secondary site or a primary site. Currently, the Recovery Wizard uses the Windows Management Instrumentation interface to talk to the SQL database on primary sites. Because secondary sites don't have that data storage, they are not appropriate to serve as reference sites yet. We are considering what it would take to provide reference site capabilities using a secondary site, but that is not going to be a part of the first release of the Recovery Wizard. Now I'll address the part of your question that relates to the parent site. If you have a central site, of course it has no parent site; therefore, the configuration data will be as current as the data that was in your backup that you restored, or as current as the data that the Recovery Wizard allows you to edit. For example, the addresses to child sites and the existence of a particular child site is one of the data items that you can either enter to the wizard directly, or that the wizard can extract from a parent site, if it exists. So in summary, having a parent site and having a reference site let the Recovery Wizard run more efficiently, and let you run a recovery cycle with a backup that is either out of date or without a backup at all. And not having a reference site and not having a parent site for the recovering site really comes down to the need to have a more recent backup; for example, backing up nightly or weekly instead of monthly or quarterly. Barbara: All right. More good information. This one is a follow-on to an earlier question about reference materials that you quoted and contacting an account team rep: Is there a particular reference name or number that they should use to acquire those materials? In other words, if they call in, what do they have to ask for to get what they need? Rob: If you request the information having to do with SMS backup and recovery, that is a well enough known and understood term that, as far as I know, any engineer in PSS that is working on the SMS product will know what you need. And those engineers are free to contact me at any time if there is any confusion about what you're asking for, where they can get the most current stuff, and what the current state of the availability of those items is. Barbara: All right. That sounds like an excellent answer there. And our next question is about server upgrade: Are there any significant problems that we should be aware of when upgrading from Windows NT 4.0 to Windows 2000? Rob: There is one item of interest when upgrading from NT 4.0 to Windows 2000. You would first need to install SMS 2.0 SP2 on that system, because not until SP2 released did we have Windows 2000 support for SMS 2.0. Next, you upgrade that system, NT 4.0, running SMS SP2, to Windows 2000, and there are two particular items to note following that upgrade, and I'll mention those now. The first thing is that the Windows 2000 upgrade procedure is tightening up two aspects of security during the upgrade process. In doing that, it is taking one particular entry out of a list of registry locations, which may be accessed remotely. The other item that it's doing is removing some access control list entries for a particular Inbox registry key that the site uses. PSS has a wealth of information about this scenario and editing these two registry keys to restore the proper settings that the Windows 2000 upgrade removed in its attempt to tighten up security. Editing these two entries back in is procedural and simple, and does not compromise the security of the site in any way, and I am fairly certain that PSS will have a Knowledge Base article published within the next few days that describes the details of this upgrade scenario, editing these registry values. And in addition to that, one of our Recovery tools that will be made available within the next two weeks or so will include the capability of fixing those registry changes for you and eliminating your need to hand-edit the registry. Barbara: Okay, excellent. There is another question about support tools: It was my understanding that there would be a tool coming out that would allow you to make a change to site settings that would then propagate down through the other sites without making the change on the individual secondary site. Is there any timeframe on this, or additional information? Rob: We are working on three distinct sets of tools currently. We are working on Recovery tools, which we've gone through today. We are working on support tools, which help PSS diagnose problems, and the next set is Resource Kit tools. The tool that you're referring to is known as Site-Setting Replication, and currently exists in a prototype format that we have used for a few demonstrations. And it is, at this point, a rather limited implementation that only picks up a small number of the overall site settings that you might want to propagate to the hierarchy. The plan that we have there is to look at this prototype tool and enhance it with a set number of items, based on a priority system of sorts, so that we are able to implement the configuration settings that you need first, because we have over 288 individual configuration settings available for a given site. And of those 288 settings, we need to lock in on the most important ones first. It's a bit of a daunting task to go through and individually analyze these items, so it amounts to a lengthy process and a fairly sophisticated tool. We understand the need and the importance of being able to replicate these settings, and we're working on it. Barbara: Good information. And before I go on to the next question, just to let you know, we are interested in your feedback regarding the WebCast program. If you have, for example, suggestions for topics for future WebCasts, or if you have specific information about this WebCast, please use the alias feedback@microsoft.com, and include "Support WebCast" in the subject line. All right, moving on, we have a question about recovery. I think we may have discussed this already: When can we expect the Recovery home page? I recently had to rebuild an SMS 2.0 site from an unknown failure and would like the Recovery Wizard before advising a client to move forward. Rob: Okay. Yes, we did cover this, but I'll cover it again briefly. We plan on starting incremental releases of Recovery content in about two weeks from now, with some overview and planning materials, and moving on from there to releasing additional tools. And that's the plan there. Barbara: All right. Our next question is about the Recovery site: Should the reference primary child site be of the same size, specifically talking about hardware, as the primary site? Does it need SQL Server installed with the same parameters as the primary? Rob: This is a good distinction. I apologize for not mentioning that in my earlier description of this. But essentially, if you take the option of providing a slave reference site, if you will, you do not need any clients reporting to it. Therefore, you do not necessarily need to have a client access point or an additional computer to host any other site system roles, such as a SQL Server or a logon point and so forth. This special reference site is intended to recover packages, advertisements, and collections that were created on the site that has failed, and were replicated down the hierarchy to the reference site. So the process of using a computer as a reference site involves reading some data from the SQL database and reading some data from the registry of the reference site. And as such, it can be a very, very small computer in terms of CPU, disk, and memory. Barbara: Good answer. And our next question is about the question and answer session: Will it be available? When and where? And I can take care of that one; the answer is yes. It will be available as on-demand replay in the spoken transcript, and that will be available in just a few hours on the Past WebCasts page. As soon as we move that page from the Current WebCasts and Upcoming WebCasts to the Past WebCasts page, then you'll be able to find it there. If you're looking for the hard copy or the downloadable copy of the transcript, that will be available in about three weeks, again, on that same Web page. You'll be able to download the PowerPoint slides as well as the document. On to another technical question: Would it make sense to have an empty top-level site to make recovery simpler? Rob: An empty top-level site — in other words, you take your current central site and you report it in to another site that is its parent, is technically a central site, but not used as one from the point of view of management. That would help you recover a certain number of properties in the site control data for the site that you're recovering, in the particular failure scenario where your central site that you manage from needs to go through a recovery cycle so that it will have a parent. But that particular scenario is a little troublesome in that if you're going to put a parent site above your central site, it is a much more significant hardware investment. It would need to be hardware that is probably equal to the same class of hardware that your central site is running on, because of the data objects that are replicated up the hierarchy, and would need to be processed subsequently on this dedicated site. The specific items that the Recovery Wizard needs to read from a parent site, if you want to use a parent site during a recovery, really relates to obtaining a site control file that is stored at the parent site, and the representation of the hierarchy that is at and below the recovering site. Most of your site configuration settings should not be changing that frequently. The odds are very good that even with a weekly backup process, your site control file information will be current enough for a recovery cycle, and you can edit the actual hierarchy representation within the Recovery Wizard. So to answer your question, I would shy away from using a dedicated central site above your management central site, just because the efficiencies and the gains don't appear to be that strong. Barbara: Thanks, Rob. We do have just a few more questions in the queue. The next one is about client access account: Does it hurt to use the SMS service account as the second client account? Rob: I don't actually know how to answer this question. Barbara: All right. In that case, if the person who sent that question would like to send another one with a little additional information, what specifically do you need to know? Rob: What I need to do is actually take this question and ask the person who actually owns the Recovery area. He is the guy who has been doing the technical editing and the content contribution for the majority of our Web site that we're working on, and he is very intimate with these scenarios and would be able to answer this question. I simply need to take this question to him, get an answer, and then get it back to you somehow. Barbara: Okay. The answer to that is, the follow-on to your answer, Rob, is that yes, if the audience member who sent the question logged on using their {Editor's note: This information is a result of our followup research and was not part of the live broadcast. The service account is a domain admin, but the client connect account needs to be a domain user. The service account should not be used as a client connect account.} And the next question is: Will the Backup and Recovery Wizard help me in moving to a new and/or upgraded server? Rob: You won't need to wait for the Recovery Wizard to do this. The reason is that we have a document that PSS has had access to for some time that is titled "Hardware Swapping with NT and SMS." And the short of it is that you can perform an image backup of your site, and then perform a restore of that site onto the new hardware, observing the computer name, domain name, and site code restrictions. And the process of doing that does not require the Recovery Wizard or the Recovery Expert. It should require going through this particular hardware swapping document. That document will be a part of our Web content that we publish. It will not be a part of the first round of content that we publish, however. I'm fairly certain you can contact PSS and ask them for the SMS hardware swapping document, and they would be able to give you a draft version of that. Barbara: The next question is: Do you recommend using a primary parent with no clients as a type of reference machine? All others are primary children. Rob: Yes, this is similar to some of the other questions that we've had on this. And the key here, in having a reference site, is that it will be a child of the site you're trying to recover. And trying to use a parent site for the purposes of a reference site is a very limited-gain proposition. So if you look at your hierarchy and you can say that all primary sites that are used for management purposes have at least one primary child beneath them, you're basically safe. Now, at the bottom-level tier, you will likely have a primary site that is used for management purposes that does not have any primary children, and that is the site that benefits most from having this dedicated reference site beneath it. Barbara: All right. And we have another question that's sort of a follow-on to an earlier answer you gave about the SMS service account as a client account. And the question is: Would it hurt anything if they really want to create new ones? That way, they could have multiple accounts outside of existing accounts for clients to connect. Rob: Yes. I believe that Michael is helping me answer an earlier question about the Service account as a client account. And the key here is not so much which account you use, but that you have multiple accounts for these clients to take advantage of. And as I think about this question more, what becomes clearer to me is that you do not necessarily want your clients to use an administrative-level account to communicate back into the site. What you actually want are domain user-type accounts to perform those kinds of tasks. And I'm fairly confident that that answers the previous question that Bart submitted about the client access account and using the site service account as a substitute for, or in addition to the existing client connection accounts. Barbara: All right. Good information again. This one is actually just a would-you-mind kind of question. As you were mentioning the hardware swapping information, this person lost the connection just as the question was being asked. So if you don't mind, would you please repeat that. Rob: Okay. The hardware swapping question was, specifically, do we need to use the Recovery Wizard if what we want to do is install new hardware or upgrade hardware under an existing SMS site? We have a document that PSS has in draft form that describes the nuts and bolts of this process. You don't need the Recovery Wizard. There are some recovery procedures that do come into play, but they're not rocket-science procedures, and they are described in the hardware swapping document. Barbara: Great. That's good information again. And the next question is on the topic of a new SMS provider: Could you elaborate a bit on the requirement for a new SMS Provider? Rob: Specifically, the SMS Provider, as implemented in the original release of SMS 2.0 and Service Pack 1, was limited in the way that it exposed the NextIDs and the ability to manipulate the IDs that will be used in creating new objects, such as a package. When you create an object like a package or an advertisement or a collection, each of those items is allocated a unique identifier. The next available identifier is stored in the NextIDs table of your SQL database, and the Recovery Wizard needs to be able to interrogate the IDs that were there in that table as it goes through and looks at reference sites and looks at the objects that have been created down there. It needs to have access to more system internal data than the old version of the SMS Provider was exposing through the Windows Management interface. So the new Provider is simply exposing a bit more data for the Recovery Wizard to have access to, and that's why the Wizard requires a new Provider. That Provider is already a part of SP2, however, so if you're using that version, you have no extra overhead to deal with. Barbara: We do have just a couple more questions, and the next one is: If I have a standalone SMS site, is running the Recovery Wizard required? Rob: In that particular scenario, there is value in using the Recovery Wizard, because that Wizard does more than just synchronize the failed site to any children or its parent. The Recovery Wizard also synchronizes that stand-alone site to the logon points and client access points that exist within that particular site. Specifically, the Recovery Wizard will make sure that any of the data files that are used by clients to know which packages need to be installed, and so forth, get regenerated and replicated back out to the client access points and logon points after the recovery cycle has completed. It's very important that, after a recovery cycle has completed, these client access points and logon points do not have leftover files. We refer to this as "dead wood." These are files that can accumulate, and the system may lose track of after a recovery cycle. So there are both inter-site synchronization issues as well as intra-site synchronization issues, and the Wizard does handle both. The recovery procedures that we're documenting through the Recovery Expert also cover these items, and the Recovery Wizard simply automates these procedures for you. Barbara: Great. Another good bit of information for us. And we have another question about SMS Provider: When we say ‘upgraded Provider with SP2,' are we talking about WMI 1.5? This isn't installed by default with SP2. Rob: We are not talking about WMI with the statement of the upgraded Provider. The Provider is a bit of program code that runs underneath WMI — a DLL under WMI as a process, for example. The SMS Provider is the engine that allows the admin console and WMI-enabled applications to access SMS-specific data and configuration settings through the Windows Management interface. So when your SMS 2.0 Administrator Console is running and performing management actions for you against the site, it is talking to WMI, which then talks to the SMS Provider, and the SMS Provider is then talking to SQL Server and abstracting the nuts and bolts of SMS. And so that's why we're talking about a very SMS-specific piece of code, rather than a new version of WMI. Barbara: Thank you, Rob. And we have another question: What is the difference between hardware swapping and site recovery? Aren't those the same thing? Rob: Well, you can have a recovery cycle become necessary and yet not have a hardware failure, or, specifically, without having the need to do a hardware swap of the machine that the site was running under. In the PowerPoint slides, we have a definition of what site recovery is, and in effect, hardware swapping is a site recovery of a particular type. So I don't want you to think that they're different things, but hardware swapping is under the recovery umbrella rather than being equivalent things. Site recovery is larger and broader than just hardware swapping. Barbara: All right. The next-to-the-last question at this point is: Can you explain again the reason the Object IDs need to be modified after recovery of a parent site? Rob: The Object IDs are the unique representation of a package, an advertisement, and a collection. Because these types of items are replicated down to all child sites of the parent that created these items, what can happen if you go through a recovery cycle, you could restore from backup and it's possible that some of these packages or collections, and so forth, might not be in the backup. So when you restore the backup, you end up with a site which has an absence of a package that its child has had given to it through replication. And this represents an orphaned object. And the owner of that object no longer has any reference to that object. It can no longer be managed. It needs to be deleted or it needs to be rebuilt from the reference site. So the reason we care so much about these Object IDs (and specifically never reusing an existing object ID) is because in the situation that I described, the administrator could, on that recovered site, create a new package. However, that package could pick up the ID that is used by an existing package that is farther down in the hierarchy, and when that new package gets replicated, it will be replicated using the ID as its unique key. That would cause that package, on all of the child sites, to become replaced with the newly created package, and that could be a very undesirable situation, depending on what settings were used for the original package and what settings were used for the package that was created after the recovery cycle. Barbara: Lots and lots of good information today and lots of great questions. And the final question is: How many of the procedures listed by the Site Recovery Expert will the Site Recovery Wizard perform? Rob: I have not actually walked through the list and identified the specific items that the Recovery Wizard is going to replace or automate. When we do release the Recovery Wizard, it will be released in a way that it is integrated with the other content — the Help files and the repair procedures — so that when you are viewing a particular recovery task list, you will have an indication that there is a particular tool available to you to provide support or automation for that particular recovery task. So in theory, when you look at your recovery task list, we hope to see as many as 25 to 30 percent of those repair tasks with a little icon next to them that says, "Use the Recovery Wizard." And we want to see that list grow over time as we make enhancements to the Recovery Wizard, with the ultimate goal being that the Recovery Wizard is now optional, and all you need is the Recovery Expert. And the reason why we have to take that longer-term approach is that we have to go through the process of engineering recovery tasks and verifying that those tasks actually work reliably. And we have to do that before we try to write any code that automates that recovery task. So essentially, the Recovery Expert becomes an implementation specification for the Recovery Wizard. And in its logical conclusion, you get to take that Recovery Expert and put it in mothballs, and you're left with just the Recovery Wizard doing everything for you. So we have to take an incremental approach to getting to that end result, but your question certainly makes a lot of sense. We ought to be working that way, and we are doing that. Barbara: All right. We have had a lively discussion today, and I appreciate everyone joining in. We have now answered all of the questions that were submitted today, so this will wrap up our session. I want to thank all of you for joining us, and I do hope this information was useful to you. We are very interested in your feedback regarding the WebCast program. You can send us your comments and suggestions using the e-mail alias feedback@microsoft.com, and please be sure to include Support WebCast in the subject line. We hope you will join us again in the near future. Thank you. And good-bye. |
|
|