Article ID: 195918 - Last Review: October 27, 2006 - Revision: 3.2 XADM: Slow Intersite Directory ReplicationThis article was previously published under Q195918 On This PageSYMPTOMS
An organization's intersite directory replication that has previously been
running without problems may start to fail to keep up with the changes made
to it's (the organization's) directory.
This can be caused by a Directory Replication bridgehead server being unable to cope with the sheer volume of replication messages that it is having to send and receive. This is likely to be because the schedule set on the organization's Directory Replication Connectors is too demanding for the intersite replication topology and messaging infrastructure trying to support it. The Intersite Directory Replication ScheduleEach Directory Replication Connector can be scheduled. The default for this schedule is once every three hours. Administrators can make this schedule more (or less) frequent, up to four times an hour.The schedule controls when and how often the intersite replication connector sends out update requests to the destination directory replication connector. When making the decision about how often directory replication should take place, Administrators must consider the potential load put on the organization's Directory Replication bridgehead servers and the messaging infrastructure. NOTE: When setting the activation schedule, if the detail view is set to '1 Hour', selecting a one-hour time block, will activate the connector four times, if you only want to activate the connector once an hour you must use the 15 minute detail view. How Intersite Directory Replication WorksBelow are the minimum set of events seen when a directory replication connector is activated (as according to the schedule) and the diagnostic logging category 'Replication' (on the Directory Service object) is turned to Maximum (on both adjacent directory bridgehead servers).The requesting bridgehead connector for each context will log the following:
1068 - Ask for updates for naming context (either site or
configuration)
1100 -> Message submitted
1058 - Completed successfully
1099 <- Message received from requesting directory 1070 - The context to get, from the starting USN 1071 - The number of objects retrieved, and entries up to the USN 1101 -> Message submitted back to the requesting connector 1099 <- Message received back from the remote Directory NOTE: Every message sent out by the requesting bridgehead server will result in a reply from the remote bridgehead server. Maximum Objects Sent per RequestA maximum of 512 objects will be sent back to a requesting bridgehead server in any one response message. If the remote directory bridgehead server has more than 512 objects to send, it will send an additional message indicating that it has more objects. Subsequently, the requesting bridgehead server, when ready, will issue a request for the next set of objects. This prevents the requesting bridgehead server from becoming overloaded (that is, when doing a 'refresh all items in directory' for example).The Number of Directory Replication Messages a Day in the OrganizationTo work out the minimum number of messages intersite directory replication will generate in your organization on a typical day, you can apply the simple formula:
N = number of sites participating in intersite directory replication
NOTE: Additional replication messages will be generated for the 'address
book views' naming context, but these are a relatively small number and
their quantity is not effected by the directory replication schedule set.
M = (N-1) * 2 = number of replication messages sent out (for both naming contexts) F = The number of times each connector is active a day (24 / 3 by default = 8) 2 = factor for every request must get a reply N sites * (M * F * 2) = Intersite Replication Messages per day For example, using the default schedule:
10 sites * (18 * 8 * 2) = 2880 messages a day
20 sites * (38 * 8 * 2) = 12160 messages a day
30 sites * (58 * 8 * 2) = 27840 messages a day
CAUSE
Directory replication will slow up significantly if the Directory
Replication Connector becomes active (as according to the schedule) and the
replies from the remote bridgehead server have not been processed from the
previous cycle.
In this situation, the connector must presume that the replies that have not been processed will not be forthcoming. Thus, the connector will request the same updates (plus any other which have happened) from the remote bridgehead server. It is possible that in a large organization where an aggressive schedule for intersite replication has been set, these messages may be getting held up in the directory service mailbox on the directory bridgehead server. NOTE: It might be reasonable for this to happen in organizations that have implemented the widely adopted hub and spoke directory replication topology, where central hub server(s) are responsible for passing directory updates between the spokes. The process of taking the messages from the directory service's inbox and giving them to the Directory Replication Agent is performed on a single thread in the Directory service (DSAMAIN). The messages in the inbox must also be sorted (TABLE_SORT_ASCEND) on the client submit time (PR_CLIENT_SUBMIT_TIME), this becomes a computationally expensive operation as messages build up. To see the number of messages waiting to be processed by the Directory Replication Agent, view the "Total no. Items" column for the Directory Service on the mailbox resources page of the Private Information Store object in the Exchange Server Administrator program. If you believe messages are building up in this mailbox you might want to observe what the Directory Service is doing. To do this using Performance Monitor, add the following counter:
Object: Thread
If the directory service is failing to keep up with the demand, you will
notice a single thread consuming the majority of Processor Time (between 50
percent and 90 percent), whilst the remaining threads are using less than 5
percent. The busy thread is responsible for passing the messages from the
Directory Service inbox to the Directory Replication Agent while keeping
the inbox sorted on 'client submit time'.
Counter: % Processor Time Instance: All the instances for DSAMAIN (use shift key to select) WORKAROUND
To work around this problem, perform one of the following:
MORE INFORMATION
It is likely that this problem will be initially triggered by extra-
ordinary behavior; either in the number of Directory Replication messages
generated over a period of time (such as adding sites to the Exchange
Server organization), or a failure in the messaging infrastructure
supporting the intersite replication (such as Message Transfer Agent
downtime).
After a Directory service mailbox has a backlog of messages (which it has to keep sorted on client submit time), it will become increasingly difficult for it to pass messages to the Directory Replication Agent. If none of the above resolutions are adopted quickly, the remote bridgehead server connectors will continue to send their requests into the Directory service inbox, compounding the problem. Real world experience has shown that after a Directory Service mailbox grows to exceed more that 1,500+ replication messages, it will never be able to recover until at least one of the above resolutions are applied. For more detailed information on designing intersite replication topologies, please see the white paper "Advanced Backbone Design and Optimization" first published on the August 1998 TechNet CD. APPLIES TO
| Article Translations
|

Back to the top
