How to create a robots.txt file
--
File robots.txt
defines which crawlers may access on your site. This file is placed at the root of website. Example for website www.google.com, the robots.txt
file will be at www.google.com/robots.txt. robots.txt
is text file which is written using the Robots Exclusion Standard. It consists of multiple rules, Each rule blocks or allows access for a given crawler to a specified file path in that website.
Example robots.txt
File
User-agent: Googlebot
Disallow: /nogooglebot/User-agent: AdsBot-Google-Mobile
Disallow: /desktop/User-agent: *
Allow: /
Here’s what that robots.txt file means:
- Googlebot is not allowed to crawl any URL that starts with
http://google.com/nogooglebot/
- AdsBot-Google-Mobile cannot crawl any URL that starts with
http://google.com/desktop/
- All other user agents are allowed to crawl the entire site.
- The default behaviour is that user agents are allowed to crawl the entire site.
Example 2 :
# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /
# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /
# Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly)
User-agent: *
Disallow: /
Important points to remember :
- Your site can have only one
robots.txt
file. - Filename must be
robots.txt
- A robots.txt file must be an UTF-8 encoded text file
- The
#
character marks the beginning of a comment.
Cheers!!