FS4SP - The server is unavailable and could not be accessed

Symptoms

During a full crawl, one site collection may generate so many errors that the whole server seems to be unavailable to the crawler. Therefore, the crawl takes a long time (high delay).

For example, you crawl one web application that contains many site collections:
http://server1/sites/sc1...sc200

In this example, you see events that resemble the following in the ULS logs:

Cause

The crawl receives information about web apps (hosts) and not about site collections. When information that the server is unavailable is sent, there are usually connectivity issues. Therefore, when this information is sent to the crawler, the crawler interprets this information as a server issue instead of a content issue.

A server is set as unavailable when the crawler process receives 32 consecutives unexpected errors. An unexpected error is something that occurs at the connectivity level, such as a time-out not. An unexpected error does not occur at the HTTP level. For example, a 404 error does not qualify as an unexpected error.

After 32 consecutive unexpected errors, the crawler marks the server as unavailable for 10 minutes. After 10 minutes, the crawler again tries to download one resource from the server. Instead of putting all documents in a failure state, the crawler delays the crawl of that server for 10 minutes. This gives time for the potential connectivity problem to be resolved

Resolution

To resolve this issue, you must exclude the problematic site collection by using the crawl exclusion rules.

Crawl exclusion rules take effect immediately after creation, and documents in progress that match the exclusion rules will be ignored. Documents that are already crawled will not be removed until a full crawl is completed. (This is not an issue in this particular case.)

However, there is an issue with this approach. The match is performed on the access URL (that is, the URL that starts "with sts4:") and not the http URL. Specifically, the match is performed on the access URL when document processing starts and on the display URL when the processing is complete. However, the match on the display URL occurs after the document is downloaded and after the crawl errors occur.

Workaround

To work around this issue, create two rules. For example, if you want to exclude http://server1/sites/sc1/* , you should create a rule for that site and another rule for the sts4 URL that corresponds to the sts4://server1/siteurl=sites/sc1/* site.
Svojstva

ID članka: 2952468 - Poslednji pregled: 14.04.2014. - Verzija: 1

Povratne informacije