The Value of Robots.txt: What is it and how does it help my SEO?

Robots.txt is a text file you put on your site to inform search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines however usually search engines obey what they are asked not to do.

If webmasters may tell the search engine spiders to skip pages that they do not believe significant enough to be crawled, then they have a better opportunity to have their most costly pages featured in the search engine results pages. The robots.txt file is an easy method of fundamentally easing the procedure for the spiders to return the most relevant search results.

The location of robots.txt is very vital. It must be in the main directory because otherwise user agents (search engines) will not be capable to search it – they do not search the whole site for a file named robots.txt. Instead, they search first in the main directory (like http://mydomain.com/robots.txt) as well as if they don’t find it there, they basically assume that this site does not have a robots.txt file as well as therefore they index everything they find along the way. So, if you don’t put robots.txt in the right place, do not be astonished that search engines index your whole site.

Why do you need robots.txt?

Robots.txt files control crawler to particular areas of your site. While this can be very dangerous if you accidentally disallow Googlebot from crawling your entire site, there are some situations in which a robots.txt file may be very handy.

Some common utilize cases include:

Preventing duplicate content from appearing in SERPs (note that meta robots is often a superior choice for this)
Keeping entire sections of a website private (for instance, your engineering team’s staging site)
Staying internal search results pages from showing up on a public SERP
Specifying the location of sitemap.
Preventing search engines from indexing sure files on your website (images, PDFs, etc.)
Specifying a crawl delay in order to stop your servers from being overloaded while crawlers load several pieces of content at once
If there are no areas on your site to which you want to control user-agent access, you might not require a robots.txt file at all.

How does it work?

Before a search engine crawls your site, it will look at your robots.txt file as instructions on where they are allowed to crawl (visit) and index (save) on the search engine results.

Robots.txt files are useful:

If you want search engines to ignore any duplicate pages on your website
If you don’t want search engines to index your internal search results pages
If you don’t want search engines to index certain areas of your website or a whole website
If you don’t want search engines to index certain files on your website (images, PDFs, etc.)

If you wish to tell search engines where your sitemap is located