Power of Robots.txt file for SEO [Search Engine Optimization]

Magazine 22 Sep , 2013  

Buffer

robotsHi all, Are you still not aware of the vast importance of the robots.txt file holds for any website ?

Obviously, there are many resources available over the internet about Robots.txt, but most of them are offering you ready-made templates of Robots.txt which you can just copy and paste for your own site. Easy stuff, isn’t it? But it may not work the way you want it to, as it has not been built focusing particularly your site on mind.

So, in this article I will try to guide you to learn and understand how the Robots.txt file actually works and how you can create your own perfectly optimized Robots.txt.

SEO consists of hundreds of element and one of the essential part of SEO is Robots.txt. This small text file standing at the root of your Website can help in serious optimization of your Website. Most of Webmasters tend to avoid editing Robots.txt file but it’s not as hard as killing a snake. Anyone with basic knowledge can create and edit his Robots file, and if you are new to this, this post is perfect for your need.

If your website hasn’t got a Robots.txt file, learn here how to do it. If your blog or website does have a Robots.txt file, but is not optimized, then follow this post and optimize your Robots.txt file.

What is the impact of Robots.txt?

Robots.txt is simply a TEXT file containing certain rules to control the way search engines behave with your site. It is just a TEXT file and can be created using any Text Editor such as Notepad and must be present in the root directory of your site.

In WordPress, there is also a provision for virtual Robots.txt file, which is a bit different from physical Robots.txt. But we will get into the details of it in the later sections.

The Need of Robots.txt for Your Site

There are three major reasons for having a Robots.txt file for your site:

i) Search Engine Optimization

You are surely not going to get your site ranked better in the search engines by letting the bots crawl and index more and more pages of your site. Allowing the bots to index anything and everything in your site, can actually harm your sites rankings.

ii) Security

There are some strong security reasons too for restricting the bots from accessing each and every corner of your site. To simplify things, there can be two types of robots – good robots and bad robots. If you give an unrestricted access of your site, then not only the search engines but also various bad bots will get a fine chance to access and steal confidential information from your site.

iii) Server Performance

Giving an unrestricted access to your site, can waste a huge bandwidth of your site and can slow down your site for the real users. There are various pages of your site which really need not to be indexed at all.You may argue by saying that your web hosts gives you “unlimited bandwidth”, but technically speaking, there is nothing called “unlimited”, as everything has got its own limits. There are many cases, where webmasters got suspended from their webhosts only because of this problem.

How to make  robots.txt file?

As I mentioned earlier, Robots.txt is a general text file. So, if you don’t have this file on your website, open any text editor as you like ( as example: Notepad) and make Robots.txt file made with one or more records. Every record bears important information for search engine. Example:

User-agent: googlebot

Disallow: /cgi-bin

If these line write on Robots.txt file it’s allow google bot for index every page of your site. But cgi-bin  folder of root directory don’t allow for indexing. That means Google bot won’t index cgi-bin  folder.

By using Disallow option you can restrict any search bot or spider for indexing any page or folder. There are many sites who use no index in Archive folder or page for not making duplicate content.

Where Can You Get names of Search bot?

You can get it in your website’s log, but if you want lots of visitors from Search engine you should allow every search bot. That means every search bot will index your site. You can write User-agent: *  for allow every search bot. Example:

User-agent: *

Disallow: /cgi-bin

That’s why every search bot index your Website.

What You Shouldn’t do?

1. Don’t use comments in Robots.txt file.

2. Don’t keep space in the beginning of any line and don’t make ordinary space in file. Example:

Bad Practice:

   User-agent: *

Dis allow: /support

Good Practice:

User-agent: *

Disallow: /support

3. Don’t change rules of command.

Bad Practice:

Disallow: /support

User-agent: *

Good Practice:

User-agent: *

Disallow: /support

4. If you want no index more then one directory or page don’t write along with these names:

Bad Practice:

User-agent: *

Disallow: /support /cgi-bin /images/

Good Practice:

User-agent: *

Disallow: /support

Disallow: /cgi-bin

Disallow: /images

5. Use capital and small letter properly. As example you want no index “Download” directory but write “download” on Robots.txt file. It make miss understand for search bot.

6. If you want index all page and directory of your site write:

User-agent: *

Disallow:

7. But if you want no index for all page and directory of you site write:

User-agent: *

Disallow: /

After editing Robots.txt file upload via any ftp software on Root or Home Directory of your site.

 

Robots.Txt for WordPress:

You can either edit your WordPress Robots.txt file by logging into your FTP account of server or you can use plugin like Robots meta to edit robots.txt file from WordPress dashboard. There are few things, which you should add in your robots.txt file along with your sitemap URL. Adding sitemap URL helps search engine bots to find your sitemap file and thus faster indexing of pages.

sitemap: http://www.codesupport.info/sitemap.xml

User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /archives/
Disallow: /author
Disallow: /comments/feed/
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /
 

Allowing Specific User Agents

It’s not always necessary that you would want to block bots from accessing your site. You may sometimes want some specific bots to crawl and index your site openly and independently. The “Allow” directive is your friend in this case.

For example, if you have taken part in the Google Adsense program, then you would certainly want the Adsense bots to access your site fully and retrieve any information they need. There is simply no reason to block these bots from accessing your site as you want to provide them with as much information they may need.

 User-agent: Mediapartners-Google

Disallow:

Allow: /

We have kept the “Disallow” part blank which tells the bot that we don’t want to restrict them from crawling anything. But that is not enough, as you also need to tell them that you want them to crawl everything, using the“Allow” directive.

If you want to generate it directly so you can try webmaster tools for optimizing things…  Don’t forget to subscribe to our e-mail newsletter to keep receiving more SEO tips.

,