SharePoint - Sites Collections under managed paths are not crawled
You configure a SharePoint web application with the following information
- Default zone with one of the authentication mode available
- Managed path added (wildcard or explicit inclusion) or default Sites managed path used to create new sites collections
- Client Integration is disabled for the zone
You create the following sites collections
- First site collection as root (e.g. http://myserver)
- Second site collection created under 'Sites' managed path (e.g. http://myserver/sites/secondsc)
A full crawl of the content sources is performed.
In this scenario, you observe the following behavior:
- Data stored in the root site collection (i.e.) http://myserver is searchable. However, there are no results retrieved for data stored in the http://myserver/sites/secondsc site collection
- There is no error message in the SharePoint ULS logs
With enabled client integration the HTTP response headers look like this:
After disabling the client integration the HTTP response headers look like this:
What happens during a crawl:
When SharePoint crawls a start-address of a SharePoint type content source it will receive a response from the SharePoint server and then look into the HTTP response header.
If it does not find the 'MicrosoftSharePointTeamServices' entry in the response header field which happens with the Client Integration disabled, it will use the HTTP-WebSite protocol handler instead of the SharePoint Site protocol handler.
So if there is no link to subsites on the crawled SharePoint root page, SharePoint will not crawl the subsites – this is how the HTTP Web site Protocol Handler works.
- Extend the existing web application to another zone – keep the client integration for that web application enabled and change the start address for the crawler to the new zone accordingly.
- Enable the Client Integration option for the zone.
Article ID: 2728313 - Last Review: 11/20/2012 08:59:00 - Revision: 4.1