When you configure a Host Integration Server (HIS) to connect to an IBM mainframe or iSeries (AS/400) system, you install and configure one or more link services. The link services provide the communication link between the SNA Server service and device drivers, to support SNA communication over the various link types.
Once the link services are installed and configured, you need to define the actual host connections over which the SNA Server service will communicate with the IBM host systems to access the business applications. Connections as defined in the HIS configuration provide the SNA link level connection used to communicate with the IBM host system.
If the underlying link service detects a problem with the physical link to the IBM host system, an outage code is generated. This outage code is returned to the SNA Server service. When notified that an outage on the connection has occurred, the SNA Server service takes the following actions:
- Logs an Event 23 warning in the application event log
- Cleans up the active SNA sessions on the affected connection.
- Issues a Close(LINK) Request message to the link service to close the connection
When a connection outage occurs, all of the active SNA sessions (LUs) defined on this connection are closed which means that the end users lose their connectivity to the backend IBM applications to which they were interacting with. Event 23 Details
The Event 23 indicates that a host connection failed. The details included in the event message provide more specifics on the reason for the failure. The format of the event message is shown here:Event ID: 23
Source: SNA Server
Connection = connection name
Link Service = link service name
Qualifier = outage code Explanation
A link service reported a connection failure to the node. See below for more information on the outage code that caused the connection to fail.TOKEN RING, ETHERNET OR FDDI0029
The remote system is not responding to the Host Integration Server computer's attempt to activate an SNA connection.00AE
The connection has failed due to either (1) a connection timeout, due to slow response from the remote system, or (2) the remote system has deactivated the connection by sending Host Integration Server SNA Service a DISC(connect) or DM frame.
If slow response from the remote system is suspected, then the Host Integration Server SNA connection t1 and ti timers should be increased.
If you are connecting to an AS/400 and there are no active user sessions, the AS/400 will drop the connection if the AS/400 APPC controller "Switched Disconnect" setting is set to YES.
To isolate the problem, a Microsoft Network Monitor trace (or similar utility) must be used to capture an occurrence of the failure.00AF
The connection to the remote system has been lost. This could occur due to a media error on your LAN, a failure of an intermediate bridge or router, or if the remote system has stopped responding.00AC
The Windows NT DLC driver has detected a DLC protocol error in a frame received by the remote system and has sent a Frame Reject (FRMR) causing the connection to end. Event 228 should be logged when this error occurs. See the FRMR code in Event 228 for more information describing why the FRMR occurred.00AD
The remote system has detected a DLC protocol error in a frame sent by Windows NT, and has sent a Frame Reject (FRMR) causing the connection to end. Event 227 should be logged when this error occurs. Note related to IP-DLC link services
: Connections using an IP-DLC link service also log Event warnings when the connection fails. In these instances, the two outage codes that can be logged are 0029 and 00AE which have basically the same meaning as they do for connections using an 802.2 DLC link service. In the case of a 00AE outage for IP-DLC, the information regarding t1 and ti DLC timers don't apply as these are DLC specific settings.
The Event 23 message when view in the Application Event log also includes outage codes for SDLC link services. SDLC outages are not discussed here because the use of SDLC link services is rare.Troubleshooting Event 23 Connection Failures
In general, Host Integration Server connection failures that result in the logging of Event 23 warnings occur because of conditions external to the HIS Server system. Therefore, troubleshooting steps usually require actions to be taken external to the HIS Server.
Event 23 connection outages are most common in Host Integration Server environments that use the 802.2 DLC link service to connection to IBM host systems via the DLC protocol. There used to be a fair number of Event 23 connection outages with SNA Server and Host Integration Server environments using SDLC or X.25 link services, but the use of SDLC and X.25 protocols is rare these days. The use of IP-DLC link services is growing rapidly, however we don't see Event 23 warnings logged all that often in these environments. The IP-DLC link service does log a number of its own specific event messages when it has problems with the underlying TCP/IP connections that it uses for communication. An Event 23 is likely to occur for an extended outage that prevents the IP-DLC link service from establishing its required TCP/IP sessions to the Network Node Server (NNS).
For Host Integration Server environments using 802.2 DLC and IP-DLC link services, the following traces are usually required to help determine the root cause of the connection failures that result in the logging of Event 23 warnings:
- Host Integration Server traces
o These traces are captured using the SNA Trace Utility (snatrace.exe)
- Network traces
o Network capture utilities such as Microsoft Network Monitor, Wireshark, Ethereal are common used.
The basic idea is to enable the HIS and network traces prior to a connection failure. Once a connection failure occurs, you will need to stop the traces as soon as possible to prevent the trace data that contains the failure from being overwritten.Host Integration Server Traces
The following steps outline the Host Integration Server traces to capture for a connection outage:
1. Run snatrace.exe.
2. Click the Tracing Global Properties
3. Change the Trace File Flip Length (bytes)
setting from 20000000 to 200000000
(add a 0 to the default length).
4. Enable the Allow HIS Administrators to perform tracing
5. Click Apply
6. Click the Trace Items
7. Highlight the link service name
(SNAIP1, SNADLC1, etc.) specified in the Event 23 warning and then click Properties
8. Click the Message Trace
9. Enable Level 2 Messages
and then click OK
10. Highlight SnaServer
in the List of Trace Items
and click Properties
11. Click the Message Trace
12. Enable Data Link Control
and then click OK
13. You can minimize the SNA Trace Utility at this point as traces are enabled. Note
: If you close the SNA Trace Utility, you will be prompted with a warning dialog indicating that leaving traces enabled may impact server performance. You can click OK and close the window if you choose.
14. Once the connection failure occurs, restore (or re-open) the SNA Trace Utility window and click the Clear All Traces
button to turn off tracing. You want to do this as soon as possible because the HIS traces are written to circular traces files. The data will be overwritten if the traces continue to run. On busy servers, the traces can wrap quickly.Network Traces
The steps to enable a network trace depend on the network capture utility that you choose to use. Therefore, specific network capture steps are not included here. Instead, an overview of what needs to be captures is provided:
1. The network capture needs to include the data that flows between the Host Integration Server system and the IBM host system (mainframe or iSeries). If you are using the 802.2 DLC link service, you want to capture the DLC network traffic. If there are multiple network adapters in the Host Integration Server system, you need to make sure that the network capture utility is tracing the correct network adapter if the network trace is being captured on the Host Integration Server system.
You can setup a filter to limit the amount of traffic being captured. If you use a filter, one option is to capture all the network traffic to/from the MAC address (if using DLC) or the TCP/IP address (if using IP-DLC) of the remote IBM host system.
The risk with using filters is that it is possible that the filter will not capture some traffic that is relevant to the connection problems. This is most likely to occur with TCP/IP connections. A filter may miss ICMP Redirect or Destination unreachable frames. These kinds of frames may occur when the network path between the HIS server and IBM Host system is having a problem or when a router between does not have the correct setup or has lost connectivity.
These kinds of frames are sent from a different TCP/IP address (e.g. the router's address) and may get lost when you use a filter setup with HIS TCP/IP Address and IBM Host system TCP/IP Address.
2. If the Host Integration Server system and the IBM host system are separated across a WAN (e.g. routers, switches, bridges between the systems), the recommendation is to perform concurrent network traces at least on the Host Integration Server network segment and the IBM host system's network segment. This gives visibility into the network traffic on each network segment. This allows you to see if network packets are making across the WAN. Having a network trace on only one network segment does not give a complete picture because you cannot determine with certainty if a missing packet reached the other network segment or if a response was lost on the way back.
3. If the connection failures are random, the traces may need to run for an extended period. In these cases, you may want to setup the network capture utility to save network traces at certain intervals in case the traces cannot be stopped immediately when an outage occurs. This helps to make sure that the relevant data is captured.
4. When a connection failure occurs, you want to stop and save the network trace as soon as possible.Trace Analysis
The Microsoft Host Integration Server support team uses the resulting trace data to isolate the root cause of the connection failures. In most cases, the problem is caused by one of the following:
- Configuration problem
o Incorrect network addresses
o Incorrect Service Access Points (SAPs)
o Incorrect firewall settings (IP-DLC)
- Network Problems
o Transient network issues
o Router/Switch/Packet Shaper issues
- IBM Host System outages
o Normal maintenance windows
o Unexpected outages on mainframe or iSeries systems
Generally, the network traces are the main troubleshooting tools for these types of problems because they are normally external to Host Integration Server. The HIS traces are useful to see the data flow leading up to the outage. The link service Level 2 message traces also include DLC return codes (for 802.2 DLC link services) that can also be helpful. The following Knowledge Base article describes how the DLC return codes are interpreted using these traces:INFO: Interpreting 802.2 DLC Outage Code from SNA Server Level 2 Trace
For 802.2 DLC connection failures, the outage code in the Event 23 tells you why the connection dropped. If you refer to the specific outage descriptions above, you will see get an idea as to what to look for in the traces.
The traces help verify or prove the underlying cause as noted by the outage code.
For example, an outage code of 0029 (Remote system not responding) usually indicates that the 802.2 DLC link service is not getting a response to the DLC TEST or XID commands that it is sending to the remote system to establish the DLC connection. In these cases, you should also see an Event 230 warning logged in the Application Event log that indicates if the problem occurs because of a TEST or XID command. The network traces in this case are used to verify that the TEST and/or XID command is sent on the wire by the 802.2 DLC link service. If concurrent network traces were captured, you can see if the TEST or XID commands reached the remote network segment. If so, then you should see a TEST or XID response being sent back. If you see the response in the remote network trace but not the other network trace, then the packets were dropped/lost while traversing the WAN. At this point, troubleshooting continues with your network support team.
For the outage codes that indicate a Frame Reject (FRMR) being sent or received, you would search through the network trace for the FRMR and then work backwards to see the sequence that led to the Frame Reject being sent. In these cases, typical causes are DLC frames being sent out of sequence, incorrect sequence numbers, incorrect DLC frame size. Additional details will be included in a corresponding Event 227 warning, which includes a frame reject code that specifies the reason the frame reject was sent.