The number of successfully crawled documents does not match the number of documents that are in a content collection in FAST Search Server 2010 for SharePoint

Article ID: 981295 - View products that this article applies to.
Expand all | Collapse all

SYMPTOMS

After creating a content source in the FAST Content Search Service Application (SSA), and starting a full crawl, you may observe that the number of documents that are successfully crawled and submitted to Microsoft FAST Search Server 2010 for SharePoint is greater than the number of documents indexed by FAST and searchable.

CAUSE

This issue occurs because the FAST document processing pipeline may drop documents if the documents contain META tags such as "noindex."

MORE INFORMATION

The Windows PowerShell command Get-FASTSearchContentCollection shows all content collections and the number of documents that are in each collection. This represents the number of all successfully indexed documents from all content sources in the FAST Content SSA or any another FAST Search specific connectors.


You can check what documents are dropped when you use the document processing tools. To do this, follow these steps:
  1. Use the following command to enable debug and trace from the document processors from the FAST Search management console:

    psctrl debug on
    psctrl doctrace on
  2. Re-crawl the content, and then use the following command to see the detailed output for each processed document:
    doclog -a

    This shows successfully processed documents and any dropped documents, with detailed information.
  3. Use the following command to turn off debug and trace:

    psctrl debug off
    psctrl doctrace off

Properties

Article ID: 981295 - Last Review: September 22, 2011 - Revision: 7.0
APPLIES TO
  • Microsoft FAST Search Server 2010 for SharePoint
  • Microsoft FAST Search Server 2010 for SharePoint Internet Sites
Keywords: 
kbprb kbexpertiseinter kbsurveynew KB981295

Give Feedback