You obtain incorrect results when you run a full-text search query that uses a thesaurus file in SQL Server 2005

Extended support for SQL Server 2005 ended on April 12, 2016

If you are still running SQL Server 2005, you will no longer receive security updates and technical support. We recommend upgrading to SQL Server 2014 and Azure SQL Database to achieve breakthrough performance, maintain security and compliance, and optimize your data platform infrastructure. Learn more about the options for upgrading from SQL Server 2005 to a supported version here.

This article has been archived. It is offered "as is" and will no longer be updated.
SYMPTOMS
Consider the following scenario:
  • You define a term in a thesaurus file in Microsoft SQL Server 2005.
  • The term contains a special character. For example, the term contains one of the following characters:
    • $ (dollar sign)
    • & (ampersand)
    • # (number sign)
    • ' (single quotation mark)
    • - (hyphen)
  • You run a full-text search query that uses the thesaurus file.
In this scenario, you obtain incorrect results.

Additionally, if you use a hyphen in the thesaurus file, you will find that an error message is logged in the event log. For example, the thesaurus file contains the follow tag:
        <expansion>            <sub>pa</sub>            <sub>posterior anterior</sub>            <sub>posterior-anterior</sub>        </expansion>
You receive the following event log when you run a full-text search query that uses this thesaurus file:


Event Type: Error
Event Source: MSFTESQL
Event Category: MSFTESQL Service
Event ID: 4154
Date: Date
Time: Time
User: N/A
Computer: ComputerName
Description:
The thesaurus file C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml for lcid 1033 has invalid format. Please edit the file and correct it. Use retail tracing for detailed error description. See user documentation on fulltext retail tracing.Component: MicrosoftIndexer


Note The special characters that are listed here are only examples. Other special characters may also cause the problem.
WORKAROUND
To work around this problem, avoid using special characters in thesaurus file terms.
STATUS
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
MORE INFORMATION
When the word breaker encounters a word-breaking character in a term, the word breaker parses the character as a white space character. For example, if the word breaker encounters the term "look&out," the word breaker parses the term as the two words "look" and "out."

Note Word-breaking characters include the following:
  • $ (dollar sign)
  • , (comma)
  • & (ampersand)
  • # (number sign)
When the word breaker encounters a single quotation mark (') in a term, the word breaker correctly parses the term. However, the full-text thesaurus component keeps only the characters that follow the single quotation mark in the term. For example, if the original term is "L'Mary," the term appears as "Mary" in the thesaurus file.

When the word breaker encounters a hyphen (-) in a term, the word breaker correctly parses the term. However, the full-text thesaurus component treats the characters that are connected by the hyphen together with the hyphen itself as empty characters. For example, if the original term is "well-known celebrity," the term appears as "celebrity" in the thesaurus file.

The problem that is described in the "Symptoms" section may also cause a duplicate entry or an empty entry for a term in the thesaurus file. The full-text thesaurus component cannot load a thesaurus file if the file contains a duplicate entry or an empty entry.

Note In SQL Server 2005, if the full-text thesaurus component finds an error in the thesaurus file, the full-text thesaurus component does not load the thesaurus file.

For example, you define the following terms in the thesaurus file:
<expansion>	<sub>Windows</sub>	<sub>Windows&you</sub><expansion>
In this scenario, the ampersand will be parsed as a white space character. By default, the word "you" is listed in the noise file and is ignored. Therefore, the word breaker parses the term "Windows&you" as "Windows." Because the thesaurus file already contains the term "Windows," the thesaurus file now contains duplicate terms. Therefore, the full-text thesaurus component does not load the thesaurus file.
Properties

Article ID: 923317 - Last Review: 12/09/2015 06:37:30 - Revision: 2.1

Microsoft SQL Server 2005 Standard Edition, Microsoft SQL Server 2005 Workgroup Edition, Microsoft SQL Server 2005 Developer Edition, Microsoft SQL Server 2005 Enterprise Edition

  • kbnosurvey kbarchive kbsql2005fts kbexpertiseadvanced kbtshoot kbprb KB923317
Feedback