FAST Patch: sharepointconnector.1.0.SP2.complete.patch03.Win32

Summary

ID    : sharepointconnector.1.0.SP2.complete.patch03.Win32
Product   : SharePointConnector 1.0.SP2 on Win32
Category   : Recommended fix
Date released : 22-Oct-10
 

Overall Description
*******************
This patch basically addresses the issues reported by various connector customers.
 
  o Robustness improvements - The connector will log various SharePoint content errors and exceptions, and continue to crawl with available content. 
  o Retrying of failed contents - The connector will save failed sites, list and documents (download failures) in the state database and will later retry this failed content. Instead of performing the retry inside the same connector run, it will instead be retried in later runs of the connector.
  o If a document has been intentionally dropped from the pipeline, the connector will no longer report this as a failed callback. 
  o Callbacks have been changed to report the document URL instead of document ID to make the log easier to understand and react upon. 
  o Various improvements to error log messages so the users can understand what exactly went wrong.

Configuring the new retry mechanism
========================
At the end of each connector run, the connector will retry failed content once.
If there are still failures, the connector will retry these failures in the next run.
By default the connector will retry this failed content up to 5 connector runs.
After the connector has reached this limit it will not retry failed content.
 
However if you still want the connector to retry failed content, you can increase the number of retry runs.
Please add a new configuration option NumberOfRetryCycles to the General section of your connector configuration file and set this to desired number of retry runs (e.g. 10).
 
A note about how to read the connector logs
============================
When the connector has completed a successful initial or incremental crawl, it will perform one sweep across all the previously failed documents (whether they failed in the current run or in a previous run) and retry them before it terminates. As a result of this, the summary at the end of the logs will be the summary of the retry sweep and not the initial or incremental crawl. To find the summary from the initial or incremental crawl, you need to look further up in the log.

   Build version: 1.0.44.0

Pre-requisites
**************
SharePoint Connector 1.0 SP2, with or without any patches

Why would you apply this fix
**************************** 
  o The connector fails a crawl and terminates prematurely. This version will catch more errors and log them and then move on. 
  o The connector spends a lot of time retrying download of content which is temporarily unavailable, or unavailable because of wrongly configured security. 
  o You drop documents intentionally in the ESP pipeline and the connector log is spammed with error messages about this.
  o In case of failures, the connector is not providing sufficient/user friendly log messages to understand the exact problem.
  o Some data is never made searchable due to the connector not being able to download the content and after having retried a certain number of times, the connector gives up and never retries to download that data.

Known impacts of applying this fix
**********************************
There is one known issue in this version of the connector, that also exists in all earlier versions of the connector, that has not been fixed:
 
When the connector fails to fetch all changes from SharePoint during the initial phase of an incremental run, it will not proceed and process any changes.
 
This is due to a known issue in SharePoint:
 
If some data in SharePoint has badly configured permissions, the call to the Web Service method GetChanges() will fail with "Unauthorized".
 
The consequence of this is that the connector will fail to run a successful incremental crawl until the "bad" change ID has expired. Unfortunately, this also causes the saved change token (from the full crawl or from the last successful incremental crawl) will also have expired. Hence the only resolution will be to go back to a full craw.
 
NOTE: This issue occurs very seldom and should not occur when SharePoint is administered in a "normal" way.
 
This error can be recognized by the following error in the connector log:
 
[2010-10-12 1:22:49 PM] FATAL    : SharePointConnector : Error running connector SharePointConnector. Will terminate. Error: SharePoint connector failed to retrieve content though SharePoint Web Services
[2010-10-12 1:22:49 PM] DEBUG    : SharePointConnector :    at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSConnectorExceptionHelper.ThrowSiteDataException(Exception exception, SiteDataParameters param)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSCrawler.GetChangesResponse(ObjectType objectType, String contentDBId, String& lastChangeId, String& currentChangeId, Int32 timeOut, Boolean& moreChanges)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSCrawler.GetChanges(ObjectType objectType, String contentDBId, String& lastChangeId, String& currentChangeId, Int32 timeout, Boolean& moreChanges)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSSubSiteCrawler.ProcessChanges(String& lastChangeToken, String& currentChangeToken, Boolean& hasMoreChanges, String objectId, ObjectType objType, Boolean isFirstIteration)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSSubSiteCrawler.FindSitesAndListsFromGetChanges()
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSTaskSplitter.GetStartSitesAndLists(IFastConfig configuration, Boolean addIncludeUrltoTheTaskList, Boolean isSubTask)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSTaskSplitter.Split(IFastConfig configuration, Boolean addIncludeUrltoTheTaskList, Boolean isSubTask)
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.MOSSTaskSplitter.Split(IFastConfig configuration)
   at com.fastsearch.esp.cctk.framework.ANPConnectorManager.Start()
   at Microsoft.SharePoint.Search.Extended.Content.MOSSConnector.ANPMOSSConnectorManager.Start()
   at com.fastsearch.esp.cctk.framework.ConnectorManager.RunCommandLineWithReturnValue(String[] argv)

 
The only workaround for this error is to manually fix the "bad" permissions in SharePoint. To be able to locate where in SharePoint the "bad" permissions exist, you will need help from FAST services.
 
NOTE that the permissions will need to be fixed within the time it takes for the SharePoint changelog to recycle. This is by default 15 days in SharePoint 2007 and 90 days in SharePoint 2010.

Install procedure
*****************
1) make a backup of your current installation by making a copy of the installation folder (typically C:\Program Files (x86)\FAST\FAST SharePoint Connector).
 
2) Unzip the archive called sharepointconnector.1.0.SP2.complete.patch03.Win32.zip on top of the installation folder (typically C:\Program Files (x86)\FAST\FAST SharePoint Connector). This will replace all your binaries in the bin folder and a few other files. When prompted for whether you want to overwrite files, select "Yes". Your configuration files will not be replaced and can be reused with the new version.
 
This release does not require a full re-crawl to be operational but it’s highly recommended. Errors in the previous version may have caused document loss and these missing documents will only be included again by doing a full crawl.
 
To trigger a full crawl with this release using your existing configuration, you need to either delete and re-create your state database or, if you're not worried about space in your state database, set the configuration parameter Database/TableNamePrefix to a new unique value. This will cause a whole new set of tables to be created.
 
If you want to not perform a full crawl but instead go directly to incremental, you will need to migrate your state database. This version makes use of a few new tables and it also expects some of the URLs in the state database to contain trailing slashes.
 
To migrate your state database, Follow the instructions in the file Migration-ReadMe.txt which resides in the etc directory.

This is a fix for issues
************************ 
 o 850934 : Fail to crawl resources in Incremental / Restart mode due to null exception
 o 850976 : SharepointConnector 1.0 does not retry content aggregation if it fails while downloading content.
 o 850989 : Errors in callbacks is impossible to relate to the relevant content
 o 850990 : SharePoint connector should not sleep / retry on "401 Unauthorized"
 o 851005 : FAST SharePoint .NET connector failures in incremental crawls
 o 851105 : callbackhandler should not treat intentionally dropped documents as failed
 o 851204 : Documents are getting missed in the Restart mode
 o 851237 : List items with attachments failed to index with a “object reference not set to an instance of an obeject”

Issue Reproduction
******************
Some data is never made searchable due to the connector not being able to download the content 
 
EXPECTED RESULTS
The connector should keep the information on failed contents and try to index them later 
 
ACTUAL RESULTS
The connector retries to get the content then and there, but never again later. 
 
HOW TO REPRODUCE 
Explicitly deny access to the crawler user to some content and run the connector to crawl that content. You should see ERRORs in the connector log that it fails to download the content and that it retries (for a certain amount of time). After that time has elapsed, the connector moves on and never tries to get that content again 
 
Then connector log is spammed with ERROR messages about documents that are intentionally dropped in the ESP pipeline 
 
EXPECTED RESULTS 
The connector should not consider the intentionally dropped documents as failed documents. It should treat them as successfully submitted documents.
 
ACTUAL RESULTS 
The connector reports the intentionally dropped documents as failed documents. 
 
HOW TO REPRODUCE
Add a document processor stage to the ESP pipeline that returns ProcessorStatus.Completed from the Process() method. This results in the document being dropped from the pipeline (it will not be indexed and searchable). Observe that the conenctor log contains ERROR entries about this. 
 
When there are errors in the ESP pipeline for a document, the ERROR entry in the connector log contains a GUID based document ID instead of the URL of the document 
 
EXPECTED RESULTS
The callback handler should provide a document URL making the log easier to understand and to act upon. E.g. by going to the relevant content and fix it or to exclude it from crawling. 
 
ACTUAL RESULTS
The callback handler provides document IDs which are practically impossible to relate to the failed document in SharePoint.. 
 
HOW TO REPRODUCE
Add a file to a document library that is of an unsupported file type (e.g. .exe) and remove this file type from the configuration parameter Filters/ExcludeExtensions so that the connector will not drop it. You should see an ERROR entry in the connector log that says "Unsupported mime-type format: application/octet-stream". Observe that the ID of the failed document is a GUID based identifier. It is hard to correlate this to the relevant document ni SharePoint.

More Information

This patch can be downloaded from MSConnect at the following links: 

Win32: 

https://connect.microsoft.com/fastsearch/Downloads/DownloadDetails.aspx?DownloadID=32037 

If you do not have access to MSConnect and will be downloading software for your organization, please submit a request to fastcsrv@microsoft.com. Please be sure to include the complete contact details and the contract identifier.
Özellikler

Makale No: 2647861 - Son İnceleme: 5 Oca 2012 - Düzeltme: 1

Geri bildirim