Article ID: 188340 - Last Review: July 16, 1999 - Revision: 1.0

Search HTML Filter Ignores UTF-8 Character Encoding

This article was previously published under Q188340
Expand all | Collapse all

SYMPTOMS

Search does not index text on HTML pages that have been UTF-8 encoded.

CAUSE

The HTML filter that ships with Site Server 3.0 is not capable of handling UTF-8 character encoding.

RESOLUTION

To resolve this problem, apply the latest Site Server 3.0 service pack.

STATUS

Microsoft has confirmed this to be a problem in Site Server version 3.0. This problem has been corrected in the latest U.S. service pack for Microsoft Site Server version 3.0. For information about obtaining the service pack, query on the following word in the Microsoft Knowledge Base (without the spaces):
S E R V P A C K

MORE INFORMATION

The HTML filter has been updated to support UTF-8 encoding. Also, the language and codepage tables have been updated.

UTF-8 is not automatically detected. Only documents explicitly tagged with:
<meta http-equiv=content-type content="text/html; charset=utf8">
are interpreted as UTF-8.

UTF-7 and UTF-16 are not supported.


APPLIES TO
  • MSN Search, when used with:
    • Microsoft Site Server 3.0 Standard Edition
Keywords: 
kbbug kbfix kbsiteserv300sp1fix KB188340
Retired KB ArticleRetired KB Content Disclaimer
This article was written about products for which Microsoft no longer offers support. Therefore, this article is offered "as is" and will no longer be updated.
 

Article Translations