This article has been archived. It is offered "as is" and will no longer be updated.
In large Microsoft Systems Management Server (SMS) 2003 hierarchies that have many sites, site-to-site replication slows down.
The volume of files may be larger than you expect in the following folders on a site server:
These files represent site-to-site replication data that has been queued for processing by several components of the SMS_EXECUTIVE service. A baseline for the site is required to determine whether the counts are larger than expected. Large queues of replication information are occasionally expected. These large queues are typical when specific conditions exist.
Note The baseline is defined here as some historical measure of the volume of files in the Inboxes folder structure.
The following conditions can cause backlog scenarios:
Network or other infrastructure issues prevent the sender component from completing pending replication work.
Poor disk performance or slow I/O occurs because of a contention for disk resources.
SMS bandwidth restrictions limit the throughput of the sender component. This behavior keeps more send requests and jobs around for longer periods.
When addresses are unavailable, the SMS Scheduler component cannot schedule send requests by using the sender for the given address. This issue delays the part of the work that is associated with scheduling the send request until the address is available.
Distributing many or large packages in a short time creates a high load on the components that are involved in site-to-site replication.
Overly aggressive schedules exist for discovery data generation, inventory collection, collection evaluation, and so on.
In a hierarchy that has three or more tiers, middle-tier sites that have many child sites handle larger volumes of jobs and replication objects. This behavior occurs because of site-to-site replication routing. The load of a middle-tier site is increased for each child site that is attached. Therefore, reducing the number of attached sites can, in some cases, reduce this load.
Sites are removed from the hierarchy incorrectly.
In most cases, when the conditions that cause significant replication queuing have been corrected or when these conditions have subsided, the queued replication data is processed and then cleared.
When the SMS Scheduler component is processing large quantities of active jobs and send requests, the throughput of the Scheduler component begins to slow. This behavior occurs because of a corresponding increase in processing overhead for the increased quantities of objects.
In some instances, if a large enough queue of data is formed, it can take days or even weeks to be completely processed. The time that is required to process the queued data depends on the many variables that affect replication performance in the hierarchy and in the environment. These variables include disk I/O performance, network speeds, bandwidth restrictions, size of queued data, and object count. When a large queue of backlogged replication data has been formed, adding additional loads increases the time that is required for all data to be processed.
In most cases, the appropriate action for a large backlog of replication data is to first correct any issue that may be preventing processing of replication data. Next, you may have to reduce the quantity of site-to-site replication traffic. Finally, make sure that the SMS_EXECUTIVE service can run uninterrupted to complete processing in a timely manner. Service restarts can add significant overhead. Limiting SMS_EXECUTIVE service restarts is important because the initialization work for the SMS Scheduler component is proportional to the number of jobs, send requests, and routing requests that are currently queued for processing.
Note The SMS_EXECUTIVE service hosts the SMS Replication Manager, SMS Scheduler, and SMS Sender components.
A supported hotfix is now available from Microsoft. However, it is intended to correct only the problem that is described in this article. Apply it only to systems that are experiencing this specific problem. This hotfix may receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next Systems Management Server 2003 service pack that contains this hotfix.
To resolve this problem immediately, contact Microsoft Customer Support Services to obtain the hotfix. For a complete list of Microsoft Customer Support Services telephone numbers and information about support costs, visit the following Microsoft Web site:
Note In special cases, charges that are ordinarily incurred for support calls may be canceled if a Microsoft Support Professional determines that a specific update will resolve your problem. The usual support costs will apply to additional support questions and issues that do not qualify for the specific update in question.
You must have SMS 2003 Service Pack 2 (SP2) or SMS 2003 Service Pack 3 (SP3) installed before you apply this hotfix.
Note This hotfix should be applied on all sites in the hierarchy.
You do not have to restart the computer after you apply this hotfix.
Hotfix replacement information
This hotfix includes several previous hotfixes. These hotfixes include 917435, 927723, and 905751 for SMS 2003 Service Pack 2 (SP2) and 907311 for SMS 2003 Service Pack 3 (SP3).
The English version of this hotfix has the file attributes (or later file attributes) that are listed in the following table. The dates and times for these files are listed in Coordinated Universal Time (UTC). When you view the file information, it is converted to local time. To find the difference between UTC and local time, use the Time Zone tab in the Date and Time item in Control Panel.
SMS 2003 Service Pack 2
SMS 2003 Service Pack 3
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
This hotfix does not directly eliminate or prevent backlogs. Instead, this hotfix improves the performance of the SMS Scheduler component when a large amount of work is outstanding. After you install this hotfix, you should expect some reduction in the time that is required to recover from a large backlog of replication data. The changes are designed specifically to improve the performance of the SMS Scheduler component by helping it recover more quickly from large backlogs of data. No noticeable performance or throughput increase is expected for sites that are not experiencing a heavy load in the SMS Scheduler component.
Changes include the following:
More aggressive deletion of completed or failed jobs at startup to reduce overall file counts.
Improved performance for processing a job that has an incorrect send request reference.
Performance improvements that reduce disk I/O operations and improve efficiency of memory manipulation of the job and the send request lists.
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
824684 Description of the standard terminology that is used to describe Microsoft software updates