2011/02/20

Understanding the sharepoint index server and crawling of data



Within a SharePoint farm, one will typically have a server that performs the index function of the farm in order to manage crawling of searchable content and create an index file. I was always under the impression that this Index server performs all of the crawling for the crawlable entities, but recently discovered this is not the case. The Index server manages the crawl of all content, performs the crawls for items external to the SharePoint farm, but makes requests to the available Query or Search servers in order to crawl the internal content of the SharePoint farm. The Query servers in turn, return the results to the Index server to add to the index file itself which is then passed along to the Query servers again.

Since crawl operations can potentially cause performance impacts to your SharePoint environment, Microsoft has recommended as a best practice, having a dedicated query server available to crawl data. This query server should not be part of the potentially load-balanced front-end servers that actually serve up the web pages to end-users and should only be responsible for crawling the SharePoint content internal to the SharePoint farm.

Now, some of you probably are thinking this seems like overkill to add another server to your farm merely for crawling, and in many scenarios it probably is. Microsoft has recommended that crawling of content less than 500 GB (keep in mind this is data internal to your SharePoint farm) of data will not result in any performance-related issues without having a dedicated Query server to perform crawls. Once your site has approached this level however, you may want to look into the potential for creating a dedicated Query server to crawl your content. Including the Indexing capabilities onto this dedicated Query server to perform crawls will help alleviate some potential performance issues as well by not having to pass the crawled data across your network. Your index server will still have to pass the index file and incremental files to all of the Query / Search servers, but you'll at least eliminate some network chattiness by throwing the Index server on the same box as the crawl-dedicated Query server.

2010/01/02

To hide unApproved content in query

To hid unapproved content in SPQuery object set the following property in code behind
as the following.

query.ViewAttributes = "Scope='Recursive' ModerationType='HideUnapproved'";