Wednesday, December 9, 2009

Googlebot

Googlebot is a search bot software used by Google, to collects documents from the web to build a searchable index for the search engine (Google).

If a webmaster wish to restrict the information on the site available to a Googlebot, or another search engine spider, they can do so with the appropriate directives in a robots.txt file, or by adding meta tags to the webpage. Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".

Currently Googlebot only follows HREF links and SRC links. Googlebot discovers pages by harvesting all of the links on every page finds. New web pages must be linked to other known pages on the web in order to be crawled and indexed.

A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.


0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | coupon codes