Wildcard query cutoff behavior in FAST ESP

Symptom

When performing a wildcard search, you observe a limiting of your search results in one of two ways (based on your ESP configuration):
  1. Soft cutoff: You get less than the expected number of results with wildcard queries, e.g. "quick b* fox"
  2. Hard cutoff: You receive Error 1017 - "Wildcard term b* matches more than the allowed maximum of 500 words"

Cause

A particular wildcard expression in your query expands to more than 500 words in a single index partition, as configured in hardwildcardcutoff & softwildcardcutoff in fsearch.addon. The system default is a hard wildcard cutoff of 500 with soft default disabled. This cap can be changed, but you may reach the cap due to the following scenarios:

ScenarioExampleExplanation
Query too broadfield:b*Can expand to over 500 unique words in a given field and index partition very quickly. This scenario can also interact with the two scenarios below to increase occurrences of the described symptoms.
Wildcard term in phrase evaluated independentlyfield:"quick * fox""*" in this query expands to all indexed words for the field, which are then compared positionally to "quick" and "fox".
Word boundaries lead to separate wildcard termfield:"http://www.contoso.com/*/en-us""*" in this query expands to all indexed words for the field, due to tokenization on the non-word characters performed by the search engine.

Resolution

The following table describes the possible solutions to the wildcard errors in increasing order of latency.

RecommendationExampleConsiderations
Narrow the wildcard termfield:b* -> field:brow*Expands to fewer words, avoids cutoff
Use a proximity operator instead of wildcardfield:"quick * fox" -> field:onear("quick","fox",n="1")Returns all documents where "fox" follows within 0-1 words of "quick". If "quick fox" is not a desired search outcome, increase wildcard cutoff.
Use boundary matching operators instead of wildcardfield:"http://www.contoso.com/*/en-us" -> field:and(starts-with("http","www","contoso","com"),ends-with("en","us"))Returns all documents where the field begins with "http://www.contoso.com/" and ends with "en-us". If boundary-match="yes" is not set for the field in the index-profile, these operators cannot be used.
Increase wildcard cutoff (steps below)field:b*High cutoff values (over several thousand) will result in progressively worse query performance

Increasing wildcard cutoff

1. On your admin node, open %FASTSEARCH%\etc\config_data\RTSearch\webcluster\fsearch.addon
2. Locate (if present) hardwildcardcutoff & softwildcardcutoff in the file. Generally they will be at the end of the file.
3. Change or add the cutoff value based on your desired behavior. Choose a hard cutoff to see errors when full expansion cannot be done, or a soft cutoff for query results based on incomplete expansions (e.g. - hardwildcardcutoff 10000). 
4. Save the file
5. On each search node, run “nctrl restart search-1”

More Information

What is term expansion? 

Term expansion refers to the unique words in an index partition that match a wildcard search: for example, the wildcard term brow* will expand to the words brow, brown, browser, and browning. 

Hard vs. soft cutoffs
The default values (not present in the addon file) for fsearch are:

softwildcardcutoff -1
hardwildcardcutoff 500

A value of -1 disables the cutoff. The default settings result in errors when exceeding the expansion cutoff - your system may be configured differently.

Soft cutoffs are evaluated first in the configuration, so any non-negative value for a soft cutoff will prevent the hard cutoff from being reached.

How cutoff values are reached

The term expansion cutoff for a wildcard search is calculated for each partition individually. The expansion cutoff is based on the number of unique words, not the number of matching documents. For example, in a search for "b*", an index partition may have 1,000 words beginning with "b", and the search will return those containing the first 500 unique words, regardless of their occurrence in any given document. As partitions may contain different numbers of unique terms, the overall set of results may be inconsistent or result in errors for certain partitions and not others.
Eigenschaften

Artikelnummer: 2569627 – Letzte Überarbeitung: 31.10.2011 – Revision: 1

Feedback