You receive the following event log when you run a full-text search query that uses this thesaurus file:
Event Type: Error Event Source: MSFTESQL Event Category: MSFTESQL Service Event ID: 4154 Date: Date Time: Time User: N/A Computer: ComputerName Description: The thesaurus file C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml for lcid 1033 has invalid format. Please edit the file and correct it. Use retail tracing for detailed error description. See user documentation on fulltext retail tracing.Component: MicrosoftIndexer
Note The special characters that are listed here are only examples. Other special characters may also cause the problem.
To work around this problem, avoid using special characters in thesaurus file terms.
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
When the word breaker encounters a word-breaking character in a term, the word breaker parses the character as a white space character. For example, if the word breaker encounters the term "look&out," the word breaker parses the term as the two words "look" and "out."
Note Word-breaking characters include the following:
$ (dollar sign)
# (number sign)
When the word breaker encounters a single quotation mark (') in a term, the word breaker correctly parses the term. However, the full-text thesaurus component keeps only the characters that follow the single quotation mark in the term. For example, if the original term is "L'Mary," the term appears as "Mary" in the thesaurus file.
When the word breaker encounters a hyphen (-) in a term, the word breaker correctly parses the term. However, the full-text thesaurus component treats the characters that are connected by the hyphen together with the hyphen itself as empty characters. For example, if the original term is "well-known celebrity," the term appears as "celebrity" in the thesaurus file.
The problem that is described in the "Symptoms" section may also cause a duplicate entry or an empty entry for a term in the thesaurus file. The full-text thesaurus component cannot load a thesaurus file if the file contains a duplicate entry or an empty entry.
Note In SQL Server 2005, if the full-text thesaurus component finds an error in the thesaurus file, the full-text thesaurus component does not load the thesaurus file.
For example, you define the following terms in the thesaurus file:
In this scenario, the ampersand will be parsed as a white space character. By default, the word "you" is listed in the noise file and is ignored. Therefore, the word breaker parses the term "Windows&you" as "Windows." Because the thesaurus file already contains the term "Windows," the thesaurus file now contains duplicate terms. Therefore, the full-text thesaurus component does not load the thesaurus file.