This article describes how to troubleshoot unexpected search results. This article describes what to look for when you cannot locate content in queries, and when you do not understand why a particular document was returned in the search results.
When SharePoint search data is indexed, an algorithm is applied to detect language and metadata. At index time, documents are tagged with their likely language. With foreign languages, SharePoint is limited in its ability to perform word breaking and word stemming.
For more information about how search works together with word breaking and word stemming, download the following Microsoft white paper:
Note The exact query that is being used can affect search results when multiple languages are involved. Unexpected results can occur if the language that you used to index the document is different fromthe language in which the query is run.
Make sure that you understand how queries are run in the dashboard site. When you enter search keywords in the search window, they are run as FREETEXT queries. This means that an implicit OR is placed between search terms. This provides the best possible search results from the search engine. Because of the implicit OR, the search results may contain documents that contain only one or more of the search keywords. For example, if you enter the following keywords in a query, a document that contains only the word Portal will be returned in the search results:
SharePoint Portal Server
However, exact matches are ranked higher in the search results. If you only want to find exact phrase matches, enclose your search keywords in quotation marks.
Words are also passed through stemmers and thesauruses, both of which might modify the query on its way to the index and return documents that do not contain the exact keyword that you searched for.
Understand that search results are returned for variants of the keywords.
Search keywords are passed through a stemmer, which generates inflected forms of a word. For example, if you search for the word run, you get documents that also contain the words ran and running in the search results.
Check the thesaurus for references to your search keywords.
Thesauruses are customizable, language-specific text files. You can use the thesaurus to specify synonyms and replacement terms for words. If synonyms are specified, search results might contain documents that do not contain the search keywords. For example, if you specify in the thesaurus that the word NT is a synonym for the word Windows, documents that contain either word are returned in the search results when you search for just one of the words. If you specify a replacement set, words are replaced with other words in a query. If you specify the word Windows as a replacement for the word NT, when you search for NT, you do not get documents that contain NT in the search results, you only get documents that contain Windows. Never use noise words, which are described in the following step, as synonyms or replacement terms; the noise word list is checked after the thesaurus is checked, so the words are removed from the query. See the "Advanced Topics" topic in Administrator's Help for more information.
Make sure that the keywords that you enter do not include noise words. Noise words are words that do not add value to a query and are filtered out, such as the words a, and, and the. Noise word lists are customizable, language-specific text files. See the "Advanced Topics" topic in Administrator's Help for more information.
Ensure that the user who is searching has permission to view the document. If a user does not have permission to view a document, it is not returned in search results. If the document resides in a Lotus Notes content source, and you have chosen to honor Lotus Notes security settings on the database, ensure that you have configured security mapping correctly. If the document resides in a file share content source, ensure that the document is not secured with a local machine group, or a domain local group, if the file server resides in a different domain.
Be aware that the query language can affect search results when multiple languages are involved. During indexing, the language of the document is detected, and appropriate language resources are used. The queries that you make in the portal use the language setting of the Web browser. Unexpected results can occur if
the language that you used to index the document is different than the language in which the query is run. For example, if a document is indexed with Japanese-language settings, you might not find that document if you run a query from Internet Explorer with English-language settings. For best results, make sure that the language setting in Internet Explorer matches the language setting of the document that you are searching for.
If you can locate a document, but not certain words or properties that are associated with that document, check the output of the index filter. During indexing, an index filter is the component that opens the document in its native format and extracts text and properties. Only the output of the index filter is written to the index. To view the output of the index filter, copy the document to your server and use the Filtdump.exe utility. Filtdump is provided on the Microsoft SharePoint Portal Server 2001 CD-ROM or on the Microsoft Office SharePoint Portal Server 2003 CD-ROM in the Support\Tools folder. See the Toolshowto.txt file for instructions on how to install the utility. To view the output of a document, type the following at a command prompt:
If you cannot find a document in the search results, check the search gatherer logs to ensure that the document was successfully indexed. If you were previously able to locate a document, but cannot anymore, you can also check the gatherer log to ensure that the document was not removed from the index in a recent update. This removal can occur if the document
was not accessible during the last update (for example, if a file server was down during the update).
To check the gatherer logs, select the content source for the document, and then click the Click here for detailed log link. You can use the gatherer log viewer to search for the document Uniform Resource Locator (URL) and ensure that there were no errors. By default, the logs contain only errors. If you want to verify that the item was successfully indexed, you need to enable success logging in the SharePoint Portal Server Administration Console, or use the Gthrlog.vbs utility in the Support\Tools folder on the SharePoint Portal Server CD-ROM.
Understand that thesaurus and noise word entries are affected by the fact that the index is case-sensitive and accent-sensitive. Words in the index are not stored with case or accent variations. For example, if you add the word windows to the thesaurus, and someone searches for WINDOWS, the search does not find a match. To get the best results, add thesaurus and noise word list entries for all common case variations of a word.