Search is unavailable:
[[date] 13:23:36.307] VERBOSE fdispatch All new engines up after 109 ms
[[date] 13:24:45.571] WARNING fdispatch Search node localhost:13056 down
[[date] 13:24:45.587] DEBUG fdispatch Lost Node: localhost:13056
[[date] 13:24:46.195] WARNING fdispatch Search node localhost:13070 down
[[date] 13:24:46.195] DEBUG fdispatch Lost Node: localhost:13070
[[date] 13:24:46.336] WARNING fdispatch Search node localhost:13078 down
[[date] 13:24:46.336] DEBUG fdispatch Lost Node: localhost:13078
[[date] 13:24:47.210] WARNING fdispatch Search node localhost:13088 down
[[date] 13:24:47.210] DEBUG fdispatch Lost Node: localhost:13088
[[date] 13:24:51.484] WARNING fdispatch Search node localhost:13098 down
[[date] :24:51.484] DEBUG fdispatch Lost Node: localhost:13098[2014-02-07 13:24:51.484] DEBUG fdispatch Lost Node: localhost:13098
Prior to the outage we see this many connectivity errors between search-1/fdispatch and master index:
searchctrl-search fdispatch (13052): Exception when retrieving index generations: WinHttpReceiveResponse failed. 'http://[servername]:13390/rtsearch::search_master/5.13/1390503596000000010/get_index_id_set' Error:'12002'
Also just prior to the outage the master indexer we see failure to activate index
VERBOSE indexer searchmaster_servant: Timed out waiting for active index set
DEBUG indexer ft::sequence_storage: Closing data file %FASTSEARCH%\data/ftStorage\sequences\storage_16f68.data
On the system running search we see the following inconsistency. Eventually search-1 process just stops responding
DEBUG searchctrl-search fdispatch (13052): We don't have the correct exclusionlist. We have 1113402 exclusionlisted, while master has 457468 exclusionlisted docs
search-1 begins activating a new index, and receives an update from the indexer 795 seconds into the process with another index to activate. This second update leaves the search process in an inconsistent state, where it neither correctly updates the first activation, nor acknowledges the second. The next (3rd) time an index activation message arrives, search is unable to recover because of this inconsistent state where it has neither completed the first activation nor moved on to the second.
1) On the admin node, edit these two files: %FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSearch\clusterfiles\rtsearchrc.xml.win32.template
%FASTSEARCH%\etc\config_data\RTSearch\webcluster\rtsearchrc.xml to add this parameter –
activationTimeout = "900"
2) On each of the index/search nodes, edit %FASTSEARCH%\etc\searchrc-1.xml to add or edit this parameter –
searchtimeout = "6000"
After making these changes, the indexer and search-1 processes would have to be restarted on all servers