How to correctly parse quotation marks in Full Text Search queries

Summary

SQL Server 7.0 or later Full Text Search (FTS) provides a rich set of search functionality. You can use a CONTAINS clause to perform a variety of searches, such as:

  • A word or phrase.
  • The prefix of a word or phrase.
  • A word that is near another word.
  • A word that is inflectionally generated from another. For example, the word "drive" is the inflectional stem of: drives, drove, driving, and driven.
  • A word that has a higher designated weighting than another word.
The CONTAINS syntax requires double quotation marks for prefix ("choc*") and for multiple word or phrase searches ("chocolate chip" or "oatmeal"). This can be problematic in terms of UI design, because users are not accustomed to having to properly quote their searches and are prone to making mistakes when they do. A simple function is needed that developers can use to properly quote a query string before passing it to SQL Server.

More Information

Following is a review of the basics of full-text query syntax with the CONTAINS keyword in the WHERE clause (for more complete syntax, see the SQL Server 7.0 Books Online):

  • Simple term:
    WHERE CONTAINS(QuantityPerUnit, 'bottles')
  • Phrase in simple term:
    WHERE CONTAINS(ProductName, ' "Sasquatch ale" OR "steeleye stout" ')
  • Prefix term:
    WHERE CONTAINS(ProductName, ' "choc*" ')
  • OR with prefix term:
    WHERE CONTAINS(Description, '"sea*" OR "bread*"')
  • Proximity term:
    WHERE CONTAINS(ProductName, 'spread NEAR Boysenberry')
  • Generation term:
    WHERE CONTAINS(ProductName, ' FORMSOF (INFLECTIONAL, dry) ')
  • Weighted term:
    WHERE CONTAINS(Description, 'ISABOUT (spread weight (.8), 1)
Notice that proximity, generation, and weighted terms do not use double quotation marks. Also notice that:

  • Individual words and phrases can be surrounded with double quotation marks (except in proximity, generation, and weighted terms).
  • Prefix term searches must be surrounded with double quotation marks.
  • Multiple term and phrase searches must have each word and/or phrase surrounded with double quotation marks.
With the preceding simplified rules in mind, you can create a simple parser using regular expressions to correctly place quotation marks around search strings. The algorithm to use is:

Replace all double quotes (clears the text and any improper quotations)
If the text string contains one of the key words "NEAR", "FORMSOF", or
"ISABOUT", the parsing is complete
Else
Surround any instances of 'and' or 'and not' with quotes
Surround any instances of 'or' or 'or not' with quotes
Surround the entire string with quotes
The following is a JavaScript version:

function fxnParseIt() {
// Note: sInputString code for demo purposes only, and should be
// replaced with user's code for getting in string value.

var sInputString = 'asp and database';

sText = sInputString;
sText = sText.replace(/"/g,"");
if (sText.search(/(formsof|near|isabout)/i) == -1) {
sText = sText.replace(/ (and not|and) /gi,'" $1 "');
sText = sText.replace(/ (or not|or) /gi,'" $1 "');
sText = '"' + sText + '"';
}

sInputString = sText;
}
The following is a VBScript version:

Sub ParseIt()
'// Note: sInputString code for demo purposes only, and should be
'/replaced with user's code for getting in string value.
'// Note: this code could also be easily re-written as a function,
'/ and is written as a sub for demo purposes only.

Dim strIn, RegEx, sInput

sInputString = "asp and database"

strIn = sInput
Set RegEx = New RegExp

If Len(strIn) < 1 Then
MsgBox ("You must enter a search string")
Else
strIn = Replace(strIn, Chr(34), "")
If (InStr(strIn, "formsof") > 0) Or (InStr(strIn, "near") > 0) Or (InStr(strIn, "isabout") > 0) Then
ParseMe.value = strIn
Else
RegEx.IgnoreCase = True
RegEx.Global = True
RegEx.Pattern = "( and not | and )"
strIn = RegEx.Replace(strIn, Chr(34) & "$1" & Chr(34))
RegEx.Pattern = "( or not | or )"
strIn = RegEx.Replace(strIn, Chr(34) & "$1" & Chr(34))
strIn = Chr(34) & strIn & Chr(34)
ParseMe.value = strIn
End If
End If

End Sub
Svojstva

ID članka: 246800 - posljednja izmjena: 10. srp 2008. - verzija: 1

Povratne informacije