In this tutorial we’ll talk about robots.txt and How to stop Search Engines from crawling your website’s folder or files, if you have no idea what is robots.txt then fist read this post – What is robots.txt and why it is needed. Suppose you have hosted some private files on server and you don’t want to index those files on search engine then you must use following User-agent rules in your robots.txt file.And robots.txt also help you to block bad boats which are crawling your website and killing your server resources. You can instruct search engines on how they should crawl a website, by using a robots.txt file. When a search engine crawls a website, it requests the robots.txt file first and then follows the rules with-in.
Fist create robots.txt file on your server root folder should be accessible like this.
http://www.iamrohit.in/robots.txt
You should use User-agent rule in your robots.txt file define search engine how to crawl your website.Search engine crawlers use a User-agent to identify themselves when crawling, see examples:
Allow all search engines to crawl website –
By default search engine crawl all pages of your website if you haven’t define rules in your robots.txt file.
Disallow all search engines from crawling website-
You can easily disable search engine do not craw and index any page and files of website.
User-agent: * Disallow: / |
Disallow all search engines from particular folders or files –
If you want to disallow selected folder or files should not crawl by any search engine then use following rules in robots.txt file, Here i am disallowing my private-image folder and private.doc file.
User-agent: * Disallow: /private-image/ Disallow: /doc/private.doc |
Disallow bad boats to crawl your website –
You can set by default don’t crawl my website and after that allow trusted boats to crawl your website, which help you get away from bad boats and reduce server kill time.
User-agent: * Disallow: / User-agent: Googlebot Allow: / User-agent: Slurp Allow: / User-agent: Yandex Allow: / User-agent: Twitterbot Allow: / User-agent: Yeti Allow: / User-agent: Naverbot Allow: / User-Agent: msnbot Disallow: |
Where Disallow: and Allow: / both are the same please don’t get confuse.