Fast Search:search-1 stops responding

Symptom

Issue with Microsoft's Fast Search for SharePoint 2010 version Service Pack 2 (October 2013 CU) where the customer is seeing search-1 become unresponsive. Resulting in search being unavailable for all users.  

Search is unavailable:

[[date] 13:23:36.307] VERBOSE fdispatch All new engines up after 109 ms
[[date] 13:24:45.571] WARNING fdispatch Search node localhost:13056 down
[[date] 13:24:45.587] DEBUG fdispatch Lost Node: localhost:13056
[[date] 13:24:46.195] WARNING fdispatch Search node localhost:13070 down
[[date] 13:24:46.195] DEBUG fdispatch Lost Node: localhost:13070
[[date] 13:24:46.336] WARNING fdispatch Search node localhost:13078 down
[[date] 13:24:46.336] DEBUG fdispatch Lost Node: localhost:13078
[[date] 13:24:47.210] WARNING fdispatch Search node localhost:13088 down
[[date] 13:24:47.210] DEBUG fdispatch Lost Node: localhost:13088
[[date] 13:24:51.484] WARNING fdispatch Search node localhost:13098 down
[[date] :24:51.484] DEBUG fdispatch Lost Node: localhost:13098[2014-02-07 13:24:51.484] DEBUG fdispatch Lost Node: localhost:13098

Prior to the outage we see this many connectivity errors between search-1/fdispatch and master index:
searchctrl-search fdispatch (13052): Exception when retrieving index generations: WinHttpReceiveResponse failed. 'http://[servername]:13390/rtsearch::search_master/5.13/1390503596000000010/get_index_id_set' Error:'12002'  

Also just prior to the outage the master indexer we see failure to activate index
VERBOSE indexer searchmaster_servant: Timed out waiting for active index set
DEBUG indexer ft::sequence_storage: Closing data file %FASTSEARCH%\data/ftStorage\sequences\storage_16f68.data

On the system running search we see the following inconsistency. Eventually search-1 process just stops responding
DEBUG searchctrl-search fdispatch (13052): We don't have the correct exclusionlist. We have 1113402 exclusionlisted, while master has 457468 exclusionlisted docs

Cause

The customer's network had some load, latency and connectivity issues resulting in the following scenerio

search-1 begins activating a new index, and receives an update from the indexer 795 seconds into the process with another index to activate. This second update leaves the search process in an inconsistent state, where it neither correctly updates the first activation, nor acknowledges the second. The next (3rd) time an index activation message arrives, search is unable to recover because of this inconsistent state where it has neither completed the first activation nor moved on to the second.  

Resolution

To address the behavior we increased the activationTimeout setting from 300 to 900 seconds as we have done completely avoids the scenario; we could certainly increase it further if index sizes keep increasing and affect activation times.



Actual changes:


1) On the admin node, edit these two files: %FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSearch\clusterfiles\rtsearchrc.xml.win32.template



&


%FASTSEARCH%\etc\config_data\RTSearch\webcluster\rtsearchrc.xml to add this parameter –




activationTimeout = "900"




2) On each of the index/search nodes, edit %FASTSEARCH%\etc\searchrc-1.xml to add or edit this parameter –



searchtimeout = "6000"


After making these changes, the indexer and search-1 processes would have to be restarted on all servers

Properties

Article ID: 2975591 - Last Review: 2 Oct 2014 - Revision: 1

Feedback