Missing rank contribution from phrases

Symptom

When searching for a phrase, matching documents show up with a lower rank than expected.

Cause

Phrases (double-quoted strings with more than 1 word) are transformed during query processing to the proximity operator "ONEAR" (ordered near). The ONEAR operator can only return rank for simple phrases, and certain configurations of ESP's linguistic features can expand a simple expression into a complex one (specifically, through use of the ANY operator). This expansion can happen in two ways:

Term expansionEnabled byInitial & Final Query
Query-side lemmatizationqtf_lemmatize=true (or a search profile)"red car" -> ONEAR("red",ANY("car","cars"), n="0")
Query-side synonymsqtf_synonym:querysynonyms=true (or a search profile)"red car" -> ONEAR("red",ANY("car","vehicle"), n="0")
Note: Synonym expansion happens before lemmatization during query processing, and turning off lemmatization (qtf_lemmatize=false) turns off both types of processing.

Resolution

The lemmatization strategy can be changed to lemmatization by reduction (for a cost-benefit and technical overview of different configuration options, see the “Lemmatization strategies” section of the ESP Advanced Linguistics Guide). This maintains phrase simplicity by reducing to base forms of words, instead of expanding to all of their forms.

A complete overview of the configuration steps is available in the Advanced Linguistics Guide and is summarized below:
  1. On each ESP node, open %FASTSEARCH%\etc\LemmatizationConfig.xml
  2. Set mode (and highlighting_mode, for consistent teaser highlighting) parameter to "reduction"
  3. Set optimization parameter to "qps"
  4. On each ESP node, restart all procservers and qrserver
  5. Refeed / recrawl from all content sources. This is required to index the base forms of each word.
There is no direct resolution if complex phrases result from synonym expansion, but the issue could potentially be addressed by using AND to combine query terms (e.g. AND("red","car")) instead of phrase expressions. If the query usage can be altered in this way, then the relevant documents will have rank contributions from the AND expression that they would not have had from a complex phrase.

More Information

It is possible to encounter the discussed scenario if the defined synonyms expand single word terms into multiple word terms. In this case, a simple phrase would be created due to synonyms, and if combined with query-side lemmatization, would result in a complex phrase that did not contribute to document relevancy. To restore these rank contributions, the lemmatization strategy can be changed to "reduction" as detailed in the Resolution section.

If desired, it is also possible to disable linguistic processing on specific string expressions - e.g. string("red car", linguistics="off"). In this case only the phrase "red car" is searched for, and no terms are modified. More information on FQL operators and syntax can be found in ESP Query Language & Parameters Guide.
Rekvizīti

Raksta ID: 2502579. Pēdējo reizi pārskatīts: 2011. gada 9. nov.. Pārskatījums: 1

Atsauksmes