How to set up robots.txt?

robots.txt is the file that allows you to specify the order in which your website will be indexed by search robots.

Here are some examples of what can be done with robots.txt:

  • prevent some website pages from indexing;
  • prohibit indexing for certain robots or completely prevent your website from indexing;
  • specify the time interval for re-indexing by search robots.

Configuration of robots.txt

robots.txt has to be located in the website directory. If there is no such a file in the website directory, you can simply create it.

How to specify the timeout for search robots access?

You can specify the timeout for search robots access with use of the ‘Crawl-delay’ directive:

# 40-second timeout specification only for GoogleBot
User-agent: Googlebot
Crawl-delay: 40

# 40-second timeout specification only for bingbot
User-agent: bingbot
Crawl-delay: 40

# 40-second timeout specification only for all robots
User-agent: *
Crawl-delay: 40

The ‘User-agent’ directive determines for which robot will a rule apply. You can specify names of particular robots or just create rules that will apply to all robots.

How to prevent a directory or a URL from indexing?

# prevention of the vip.html page from indexing for Googlebot:
User-agent: Googlebot
Disallow: /vip.html

# prevention of the directory ‘private’ from indexing by all robots:
User-agent: *
Disallow: /private/

# allowing access only to the pages starting with '/shared' for BingBot
User-agent: bingbot
Disallow: /
Allow: /shared

The ‘User-agent’ directive determines for which robot will a rule apply. You can specify names of particular robots or just create rules that will apply to all robots.

How to completely prevent the website from indexing?

To completely prohibit your website indexing by all search robots add the following strings to robots.txt:

User-agent: *
Disallow: /

NOTE

Not all robots follow the rules set in robots.txt. For instance, Googlebot complies to restriction rules (‘Disallow’), but does not follow the ‘Crawl-delay’ directive. If you want to impose restrictions on ‘Googlebot’, you have to use Google’s Webmaster Tools.

Google Help: About robots.txt