Description of SQL Virtual Server client connections
SQL Virtual Server client behaviorMicrosoft Cluster Server (MSCS) provides a reliable and robust platform on which you can build mission-critical SQL Server applications. You do not need to modify most server applications to use them with MSCS. However, transaction-based applications (for example, database servers, such as Microsoft SQL Server) usually require additional modification so that if the server fails, failover support properly prevents the loss of transactional integrity. Developing a client application to operate with MSCS is relatively straightforward. You must design applications with database recovery and error checking in mind.
Even without the use of clusters, a SQL Server server automatically recovers all databases when the server is restarted. To ensure that a database is recovered in a consistent application state, use database transactions so that failover occurs in the database correctly and in a consistent state. Any transactions that are incomplete when failover occurs should be rolled back, while the effects of all committed transactions should be preserved.
During failover, client applications lose their connection to the SQL Server server and must reconnect to continue processing. If the client connection to the server is stateless, (for example, applications that are developed by using Microsoft Internet Information Server [IIS] are stateless) the client reconnects to the server and continues processing. Unless the client and server have a common state (for example, open cursors, session variables, Transact-SQL global variables, or data in tempdb), failover is not transparent to the client. In these cases, you should design the client application to inform the user that the connection was either lost, or reset or have the application automatically reestablish its connection to the server. Any transaction that has not been committed when a failover occurs is rolled back.
The discussion of how clients deal with server failures is standard for any SQL Server client application, even without the use of clusters and virtual servers. The error checking process is quite similar for a client database application for a cluster. When the cluster begins failover, the client program receives an error message on the database connection. The error messages encountered depend upon what the client program is attempting to do at that time.
If a SQL Server server is failed over by the cluster admin, TCP reset packets are not sent. If the SQL Server process is terminated by the operating system (by Kill.exe), the reset packets are sent.
This may affect the client application if the application does not specify a query timeout parameter or a query timeout of zero (0).
If the application does not have a query timeout value then open connections will be left in the ESTABLISHED state after a failover occurs. The fact that the open connections are not closed and that no further TCP packets are sent from those connections indicates that those connections are completely idle. Because the failover did not send any TCP reset packets to the client application, those open connections wait for the query results indefinitely (assuming an infinite query timeout), and potentially cause the connection to stop responding (hang).
To address this issue from a client application perspective, change the query timeout to a finite number.
Virtual database failure behaviorWhen a virtual database server fails, a connection link failed error message is returned to the waiting client. The database on the failed node of the cluster is shut down and restarted on the same node per the parameters you set up in:
Start\Programs\Administrative Tools (Common)\Cluster Administrator\Group\Failover\PropertiesThe Group Failover default threshold is 10 restarts in a 6-hour period before a failover occurs to the remaining node. However, the SQL Server restart threshold, which can be verified through the SQL Server properties on the SQL Server cluster resource, has a default threshold of three restarts on that SQL Server in 900 seconds and by default does affect the group. If a client attempts to connect to the server while a database is being recovered, the client receives a waiting for database recovery error message and should retry after a short pause.
SQL Server 6.5 and SQL Server 7.0 considerationsSQL Server 6.5 and SQL Server 7.0 act exactly as described in the previous "Virtual Database Failure Behavior" section.
When SQL Server 7.0 runs as a virtual server SQL Server 7.0 supports only one IP address but might listen on additional ports as configured by the customer. This is described in the "Multiple Listen-On TCP/IP Ports" topic in the following Microsoft Knowledge Base article:
Microsoft SQL Server 2000 considerationsSQL Server 2000 has some differences in behavior from the SQL Server 6.5 and SQL Server 7.0 versions.
SQL Server 2000 port usage
By default, a named instance listens on a dynamic port. The first time the server starts with a port set to zero (0), the server requests a free port number from the operating system and then the server listens on that port. The server records this to the registry and then uses the same port every time.
If a server is configured to listen on a dynamic port and the server fails to listen on the dynamic port on startup, then the server chooses another port.
If you configured a static port either during setup or after setup by using the server network utility it fails to listen on TCP/IP if this port is in use.
Clients detect the port number to connect to in the case of a named instance or one with a non-default port number.
The connection information is written to the "LastConnect" cache in this registry key:
The client attempts to re-use the connection information on each connection unless it fails and then re-negotiates the new information. This might happen if the port number has changed because someone changed it or if it was a dynamic port that was re-assigned due to a port being in use.
Broken connectionsThere are three ways a connection can be broken:
- The server fails; the process terminates by being killed (system server process ID [spid] kill) or an access violation (AV) or something else causes the operating system or required service to fail.
- Machine hardware failure or loss of power.
- Server shutdown.
- In the case where a server fails, the client receives a connection broken error message immediately. You can simulate this behavior by connecting with OSQL, running a long query and then use KILL to terminate your SQL Server process. The client exits with an ODBC error message.
- A machine failure is more complicated. The behavior can change slightly based on how the connection loss is detected.
If the client is in the middle of reading information, the connection loss can be detected immediately because the data stops.
If the client is just waiting for results, the behavior is slightly different. The behavior depends upon the Keep Alive configuration of the client computer.
On Microsoft Windows 2000 Keep Alive is set by the client code on a per connection basis. By default, Keep Alive is set to 30 seconds. This means that if the socket dies it is detected within 30 seconds and the client receives an error message. On Microsoft Windows NT 4.0, Keep Alive cannot be set on a per connection basis. Keep Alive must be set for the whole computer, thus affecting all applications on the server.
The registry keys that are being referred to are:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters KeepAliveTime\REG_DWORD 30000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters KeepAliveInterval\REG_DWORD 1000
- When you initiate a server shutdown, the server waits a while for the clients to finish. However, if the client is still running the server kills the threads inside the server. Killing the threads may also cause different error messages on the client. Error messages can include a connection broken error; however, most of the time you see this error message: "An unknown error occurred, connection may have been terminated by the server".The ODBC native error code is set to zero (0) in this case but it is returned as an error message to the client.
Article ID: 273673 - Last Review: 12/04/2007 03:44:52 - Revision: 7.3
- kbhowto kbsql2005cluster kbclientserver kbinfo KB273673