This article discusses common link state issues and common routing issues that you may experience in Microsoft Exchange 2000 Server and in Microsoft Exchange Server 2003. back to the topThe purpose of a routing group
The routing group is the smallest unit of servers that are likely to always be connected to one another. The routing group can be assumed to be one node on the graph of connector paths, with multiple possible connectors between routing groups.
To configure the way that messages are routed between servers so that point-to-point connections between servers are always made, the servers must be grouped in routing groups, and the Routing Group connectors must be defined.
In a routing group, link state information updates and routing information updates are pushed between master nodes and member nodes through a persistent port 691 Transmission Control Protocol (TCP) connection. Between two routing groups, servers advertise the X-LINK2STATE verb to exchange link state information by comparing the MD5 digest in the Exchange organization information packet of the two routing group bridgeheads. A mismatch triggers an exchange of link state information between the two servers through SMTP port 25. back to the topThe role of a routing group master
The routing group master coordinates changes to link states that are learned by servers in its routing group and retrieves updates from the directory service. By having a single server coordinate the changes, you can treat a routing group as a single entity for the purposes of computing a least-cost path between routing groups in an organization. back to the topWhat occurs when the routing group master stops responding
All servers in the routing group continue to operate on the same information that they had at the time that they lost contact with the master.
When the routing group master comes back up, it examines the status of all other servers, reconstructs the link state information, processes the State Change Queue (SCQ), and then updates members in the routing group. back to the topCommon issues
The following sections present several routing issues that you may experience. Additionally, the following sections suggest methods that you can use to troubleshoot the issues. back to the topRouting member node is not connected to master
When you use the WinRoute tool (Winroute.exe) to view Exchange organization routing, you may see the words "connected to master - NO" and a red X next to the organization's name. These words and the red X indicate that the routing member node is not connected to the master.
In a routing group, the routing group nodes, including the master, must be connected to the master node on Transmission Control Protocol (TCP) port 691 to propagate routing information and link state information to and from the master node. Note
To download the Microsoft Exchange Server 2003 WinRoute tool for troubleshooting routing in an Exchange 2000 and Exchange 2003 mail-handling environment, visit the following Microsoft Web site. The following file is available for download from the Microsoft Download Center:Download the Winroute.exe package now.
For more information about how to download Microsoft Support files, click the following article number to view the article in the Microsoft Knowledge Base:
How to obtain Microsoft support files from online services
Microsoft scanned this file for viruses. Microsoft used the most current virus-detection software that was available on the date that the file was posted. The file is stored on security-enhanced servers that help prevent any unauthorized changes to the file.
To resolve this issue, follow these steps:
back to the topRouting group master wars
- Make sure that the Exchange Routing Engine Service (RESvc service) is started on all affected servers in the routing group and that it remains in a controlled state. If the service is in an unstable state, the server may not connect to master nodes. Investigate the root cause of any unstable services before you go to the next step.
- Verify that a firewall does not restrict TCP port 691. To do this, initiate a Telnet session to port 691 on the affected servers and on the master node. A Microsoft Routing Engine banner indicates an active state.
- At the command prompt, run the netstat –a –n command. The output of this command reveals all member nodes and the master itself connecting to TCP port 691 on the master node.
- In Event Viewer, check the application logs for any events that indicate a failure to authenticate by using the computer account , such as Domain\serverName$. Events such as Transport events 962 and 961 indicate a failure of the RESvc service to connect.
- Verify that the affected servers or the Exchange Domain Server group that they belong to do not have the SendAs right missing, denied, or denied from a nested membership of another group. To do this, run the Exchange Trace Utility (Regtrace.exe), and then restart the RESvc service. For more information about RegTrace setup on Exchange 2000, click the following article number to view the article in the Microsoft Knowledge Base:
238614Note For additional information about tools and processes that you can use to troubleshoot and to diagnose transport issues and routing issues in Exchange 2003, download the Exchange Server 2003 Transport and Routing Guide online book. To download this book, visit the following Microsoft Web site:
How to set up Regtrace for Exchange 2000
- Verify that the affected servers can generate a ServicePrincipalName (SPN) for authentication. To verify this, check the network address attribute of the affected servers by using the ADSI Edit tool (ADSIEdit.exe) or by using the Lightweight Directory Protocol tool (Ldp.exe).
Nodes in a routing group have to mutually authenticate with the routing group master to be connected. To do this, they use the ncacn_ip_tcp value in the Network address attribute of the Exchange Server computer to generate the SPN for the master node by using Kerberos authentication. Make sure that this value is a Fully Qualified Domain Name (FQDN) instead of a NetBIOS name or an IP address. Restart the RESvc service.
- Check the application log and the system log on all the affected servers for any Kerberos authentication errors. Kerberos authentication errors may be caused by an expired domain computer account password. To gain additional information about this issue, run the NLTEST utility with debug flags. For more information about how to run the NLTEST utility with debug flags, click the following article number to view the article in the Microsoft Knowledge Base:
109626Important If the domain computer account password has apparently expired, you must contact Microsoft Product Support Services (PSS) to confirm and to correct the issue. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:
Enabling debug logging for the Net Logon service
- Verify that the FQDN of the virtual server matches the FQDN in Domain Name System (DNS).
- If the membership of the routing group spans multiple domains, make sure that DNS is correctly designed and implemented between the domains.
- Look for any third-party applications that use Group Policy objects to restrict permissions or to restrict security settings.
In a routing group, the first server installed in the routing group is automatically elected as the master node. As other servers are installed, the administrator has the option to appoint another server as master.
When the new routing group master is elected, only one server should be assigned the master role at a time. This rule is enforced by an algorithm that is based on the formula "(N
/2) +1" (where N
denotes the number of servers in the routing group). The algorithm calculates the number of servers in the routing group that must agree and that must acknowledge the master. Therefore, the member nodes send link state ATTACH data (information about the routing group) to the master.
It is not uncommon for two or more servers to have erroneous information about which server is the current routing group master. For example, if a routing group master was moved or was deleted, and another master node was not chosen, the MsExchRoutingMasterDN attribute may point to a non-existent server.
This issue may also occur when an old master does not detach as master, or when a problematic node keeps sending incorrect link state ATTACH information.Note
In Microsoft Exchange Server 2003, if a routing group points to a deleted object, the master node gives up its role as master and initiates a shutdown.
To resolve this issue, use one of the following methods:
back to the topDeleted routing groups are followed by [object_not_found_in_DS]
- Look for link state data propagation through TCP port 691, for firewall hindrances such as firewall blocking of TCP port 691, and for SMTP filters.
- Look for Active Directory replication latencies.
- Look for network problem and latencies.
- Look for deleted routing group masters or servers that no longer exist. If this is the case, a Transport event 958 that references a routing group master distinguished name that no longer exists is logged in the application log. Use the Lightweight Directory Protocol (Ldp.exe) tool or the ADSI Edit (Adsiedit.exe) tool to verify that this is the case.
When servers are moved between routing groups, and when the routing groups are subsequently deleted, if you use Winroute.exe you may see the text [object_not_found_in_DS]
next to the object name.
This issue may occur if the routing engine service tries to correlate an object that still exists in a dynamic routing library that is maintained by the server with objects in Active Directory, where the object does not exist any more. Tips to resolve this issue:
back to the topConnectors are not reported to be marked as "DOWN"
- Restart all servers in the organization at the same time. This action updates routing information. Additionally, this action removes deleted routing groups and deleted connectors.
- Use the Remonitor.exe tool in injection mode.
Note Contact Microsoft Product Support Services for information about the Remonitor.exe tool in injection mode. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:
- Make sure that the servers are on a recent build of Exchange Server and that they have the Exchange Server service pack rollups installed.
Note Applying the hotfix that is described in the following Knowledge Base article is no longer necessary if your servers are on a recent build of Exchange Server and have the current Exchange Server service pack rollups installed. If you cannot install the most recent Exchange Server service pack rollups, apply the hotfix that is described in the following Knowledge Base article:
Deleted routing groups are listed in the WinRoute tool; fix requires Exchange 2000 SP3
- Restart all Exchange Server services and Windows Management Instrumentation (WMI) services on all Exchange Server computers in the organization. This resolution is effective only if all servers are restarted at the same time.
Note Contact Microsoft Product Support Services for information about restarting all servers at the same. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:
- Make sure that the account that is logged on to the server has sufficient permissions. To do this, run Winroute.exe under the System Account.
Note The lack of sufficient read permissions may cause Winroute.exe to incorrectly report [object_not_found_in_DS].
When you use the Winroute.exe tool to view Exchange routing topology, you may see that connectors that are unavailable are reported as being available ( they are marked as "UP"). This behavior may occur for the following connectors:
back to the topLink state oscillations: connectors are repeatedly marked as "UP" and then as "DOWN"
- Connectors that use DNS to route. For example, this behavior may occur for SMTP connectors that use DNS instead of smart host.
- Microsoft Exchange 5.5 Server connectors or Exchange Development Kit (EDK) connectors. These connectors do not use link state routing.
- Routing group connectors with source bridgeheads of the "any" type.
- Any connectors where one bridgehead is an Exchange 5.5 Server computer.
- Connectors that use smart host settings and recently changed smart hosts.
This common scenario involves connectors being marked as "UP" and then as "DOWN" repeatedly. It causes excessive link state updates between servers. These excessive link state updates cause a very expensive and frequent recalculation of routes within the server. This is also indicated by Event 4005 Reset Routes. This issue may occur in the following scenarios:
- Network problems. Use a network trace to diagnose this scenario.
- A reaction to link status notification calls from underlying protocol services, such as SMTP/AQ and message transfer agent (MTA). This behavior is caused by interference on the X.400 protocol levels or on the SMTP protocol levels by third-party applications.
In this scenario, only a network monitor capture can reveal the issues that are involved. Additionally, if you notice very frequent changes of the major versions, of the minor versions, and of the user versions in the WinRoute tool, this may also indicate a link state problem (see the WinRoute routing version changes section).
To reduce link state oscillations, apply the hotfix that is described in the following article in the Microsoft Knowledge Base:
Link state traffic saturates slow links between servers
After the hotfix has been applied, you must enable the AttachedTimeout registry subkey to make sure that the hotfix works as expected.Important
This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
How to back up and restore the registry in Windows
To enable the AttachedTimeout registry value, follow these steps:
- Click Start, click Run, type regedit, and then click OK.
- Locate the
- Right-click the Parameters subkey, point to New, and then click DWORD value.
- Name the new value AttachedTimeout.
- Double-click AttachedTimeout, and then type any data value from 1 to 604800. Click to select Decimal for the Base type.
Note The AttachedTimeout value represents time in seconds. The valid range for this value is 1 second to 604,800 seconds (7 days).
- Click OK, and then quit Registry Editor.
Contact Microsoft Product Support Services for more information about the AttachedTimeout registry subkey. For a complete list of Microsoft Product Support Services phone numbers and information about support costs, visit the following Microsoft Web site:back to the topHow connector states affect link states
A connector can be located anywhere in any routing group in the Exchange organization. A specific connector that is frequently marked as "UP" and as "DOWN" may seriously affect the possible routes that a message can take through the organization. Such a connect may even lead to mail loops.
Exchange routing chooses the most optimal path, based on variables such as cost, message type, and restrictions. Exchange routing locates the next server for a message to make the next hop to, and then Exchange routing gives the name of the next server to Message Queuing. Because the oscillating state of a connector causes link state changes, Exchange has to repeatedly recalculate the optimal path. This recalculation process involves queries to the directory service. back to the topHow link states affect connector states
When Message Queuing detects that a link to the bridgehead server on a connector failed, it calls into routing by using a method that is named LinkStateNotify( ). Routing then suppresses this information for up to 10 minutes to prevent connector state fluctuation, and then routing relays this information to the routing group master. If routing decides to mark the connector as "DOWN," this change is propagated to all computers in the organization, including the computer where the original failure occurred. This behavior leads to a very expensive process that is named "reset routes." Thereafter, the routing engine no longer recommends that the Advanced Queuing engine (AQ) connect to the "failed" next-hop computer. The reverse is true for a connector that is marked as "UP." back to the topWinRoute routing version changes
The WinRoute tool reports routing versions in the following format: "RoutingGroup (d5.2.3)." The three numbers that are separated by periods that follow the routing group name are the major version, the minor version, and the user version.
Major version changes are typically changes in directory service that involve routing and connectors. If there is a frequent change here, monitor it by using the Remonitor.exe tool, and then investigate it for a probable root cause. For example, an administrator may make significant changes in directory service. A major version of zero is shown for isolated routing groups with no routing and no link state exchange with other nodes. Additionally, a major version of zero is shown for Microsoft Exchange 5.5 Server-based sites because they do not use link state information.
A minor version change may indicate changes to the state of a connector. Frequent changes may be caused by faulty links or by links that fluctuate between the "UP" state and the "DOWN" state. AQ tries to send a message over a connector. If AQ fails, it sends a notification to routing to mark the connector as "DOWN." Then, AQ initiates retry pings to the connector. After AQ detects that the connector is up, AQ notifies routing by calling the LinkStateNotify() method.
User version changes may occur in the following situations:
back to the topBase-level callbacks
- Servers attach to or detach from master nodes.
- WMI services send data to the routing group master.
- There is callback registration by routing clients such as MTA or SMTP.
- There are routing group membership changes.
- You rename the routing group
- A new master node is elected.
Routing base-level callbacks are updates that occur after a routing group object is modified, and after the updates are then propagated throughout the organization. The Winroute.exe major version changes may be triggered by the following events:
back to the topOne-level callbacks
- Renaming a routing group
- Electing a new routing group master
- Removing a routing group member
- Adding a routing group member
One-level callbacks are typically updates to routing when changes that are one level below the routing group object are detected. Some examples of this are deleting a connector in the routing group and adding a connecter to the routing group. back to the topDNS
Incorrect configuration of Domain Name System (DNS) may cause several routing issues. These issues are addressed in the following sections.back to the topThe DNS Resolver sink event on the SMTP virtual server
The DNS Resolver sink event is primarily for resolving external SMTP domains. Your internal Active Directory servers and DNS servers still have to be able to resolve all Exchange Server computers internally.
The SMTP virtual server DNS Resolver sink event is synchronous and can affect performance on a heavily used server. To slightly improve the situation, increase the number of threads that are used for DNS lookups.
The DNS Resolver sink event is used only when a server is not in the Exchange organization. Exchange Server determines this by querying Active Directory directory service. back to the topWindows 2000 DNS API
If you use the DNS Resolver tool for name resolution, the lookups that are created by this tool are asynchronous and are much faster than using the default settings of the external DNS Resolver sink event.
Exchange DNS that uses the Windows DNS API or the Exchange DNS Resolver sink event has to be able to resolve an Internet Protocol address (IP address) in the following ways:
- mail exchanger resource record (MX record)-to-IP address
- MX record -to-A record-to-IP address
- MX record-to-CNAME record-to-A record-to-IP address
- CNAME record-to-A record-to-IP address
- A record-to-IP address
DNS records that are incorrectly configured, especially MX records and CNAME records, may seriously affect mail flow. Note
Although Microsoft Exchange Server 2003 does provide limited support for chained CNAME records, we do not recommend implementing this configuration.
In Microsoft Exchange Server 2003, the external DNS Resolver sink event has been improved. Additionally, you can use the DNS Diagnostic utility (DNSdiag.exe) from the Windows Server 2003 Resource Kit to troubleshoot DNS issues that involve the external SMTP resolver and the Windows TCP/IP DNS. DNSdiag.exe shows the asynchronous queries and the synchronous queries to Global DNS servers or to the DNS server that are called by the DNS sink event. Additionally, DNSdiag.exe shows any corresponding failures or errors. Note
The DNS Diagnostic utility is also known as also known as the DNS Resolver tool. They are the same file, DNSdiag.exe. The following file is available for download from the Microsoft Download Center:Download the Dnsdiag.exe package now.
For more information about how to download Microsoft Support files, click the following article number to view the article in the Microsoft Knowledge Base:
How to obtain Microsoft support files from online services
Microsoft scanned this file for viruses. Microsoft used the most current virus-detection software that was available on the date that the file was posted. The file is stored on security-enhanced servers that help prevent any unauthorized changes to the file. back to the top