PRB: Parsing HTML on Server Using Internet Explorer Components

Article translations Article translations
Article ID: 244085 - View products that this article applies to.
This article was previously published under Q244085
This article has been archived. It is offered "as is" and will no longer be updated.
Expand all | Collapse all

On This Page

Symptoms

It may be desirable to parse HTML files inside a Web server process in response to a browser page request. However, the WebBrowser control, DHTML Editing Control, MSHTML, and other Internet Explorer components may not function properly in an Active Server Pages (ASP) page or other application run in a Web server application.

Cause

Internet Explorer and its associated components were not designed or tested to be used in the constraints of the high-performance, secure user context of a Web server process.

Resolution

Microsoft does not support the use of the WebBrowser control, DHTML Editing Control, DHTMLED, or MSHTML from inside a Web server (IIS) process. Applications experiencing problems with these components should be redesigned to use alternate technologies.

Status

This behavior is by design.

More information

Most Web applications that attempt to programmatically parse HTML on the server have two steps they need to accomplish: retrieve the HTML from a remote server and parse the HTML.

Retrieving HTML from Server

When retrieving HTML from another server, Internet Explorer components should not be used. All of the mechanisms used in Internet Explorer or that use Internet Explorer components -- WebBrowser control, Internet Transfer Control, and so on -- rely ultimately on the services of a low-level client module called WININET to make requests to other Web servers. WININET is not supported in a server context and has a number of known performance problems in this environment. Thus, any server-side application that needs to parse HTML must either store all necessary HTML locally or use a lower-level Networking component or technology, such as the WinSock API or Visual Basic WinSock Control, to retrieve the HTML file before attempting to parse it. For additional information, click the article number below to view the article in the Microsoft Knowledge Base:
238425 INFO: WinInet Not Supported for Use in Services

In general, downloading data from another Web server adds an extra level of delay to a Web server that is not appreciated in a typical Web application. It is recommended that high performance server applications use an alternative design to avoid this delay.

Parsing HTML

As with the retrieval of the HTML from other Web servers, but not to as large an extent, parsing HTML is a time expensive operation. Web application developers should consider their design very carefully before creating an application that needs to do parsing on a per-request basis.

Microsoft offers a number of components that developers can re-use to parse HTML in their own applications, either with a user interface (UI) for editing or with a simple parser without a user interface. The DHTML Editing Control is probably the best choice for this job.

However, none of the HTML parsing technologies offered by Microsoft today have been designed or tested to work in a high-performance server context. There may be a number of performance concerns, especially for high-use Web servers. Developers that experience problems with these technologies in these environments should consider writing custom HTML parsing code that is optimized for just the information that the application needs to retrieve from the HTML. This yields the best performance in any scenario.

Properties

Article ID: 244085 - Last Review: October 26, 2013 - Revision: 2.0
Applies to
  • Microsoft Internet Explorer 4.01 Service Pack 1
Keywords: 
kbnosurvey kbarchive kbDSupport kbfaq kbprb KB244085

Give Feedback

 

Contact us for more help

Contact us for more help
Connect with Answer Desk for expert help.
Get more support from smallbusiness.support.microsoft.com