Increase Search Engine Rankings and prevent Duplicate Content using robots.txt

So, how does one Increase Search Engine rankings, bring in increased Search Engine Traffic and also prevent duplicate content on your blog or website using a single robots.txt file?

What is a robots.txt file?

robots.txt protocol also called the Robots Exclusion standard is used by web spiders and other web robots such as Search engine robots, blog robots, splog robots, adsense robot etc and prevents them from accessing all or part of a blog or a website which is otherwise, publicly viewable.

By placing a robots.txt file on your blog or website you are telling the specified robots to ignore and prevent their indexing of the specified files or directories.

Why is the robots.txt important for my blog?

Search Engines penalize duplicate content by placing that particular post or in some instances an entire blog or website in their supplemental index. Also known as Search Engine Hell, if your post goes into the supplemental index of a search engine, it might take almost 6-8 months for it to come out into the regular index and that too after you removed the duplicate content from that particular post.

Okay, so a Robots.txt file might prevent duplicate content. How does it increase my search engine rankings?

Simple, having a clean blog makes search engines think that you do not host spammy content and if you write good content, the robots will keep coming and index your site everyday instead of once every week. Also by preventing duplicate content, you prevent articles from going into the supplemental index which in turn means that people can find that post when they search on Google or any other search engine, which in turn means you get more traffic! Prominent blogger Neil Patel wrote on his Link Building blog that by using a robots.txt file, his web site traffic went up by 11.3%.

What content does one need to prevent the search engine robots on a website from accessing using a robots.txt file?

a. We need to deny the search engine bots and robots from accessing our WordPress folders or if you have a Movable type blog, your movable type folder. For example, in the picture below, you can see that Google is trying to access a file within my WordPress blog folder and throwing out a 403 (not found) error. This is bad

403 error for a file in a wordpress folder

b. Remove comment feeds from the search results. You should never block the robots from accessing comments on your posts, but block the search engines from accessing your comment feeds as this would result in duplicate content.

c. Remove trackback URLs from being indexed as it may cause blank pages to be indexed

d. Block any log related or stats related files.

e. Prevent the wayback Internet Archive from accessing your blog and storing a screenshot of your blog

SEO Optimized Robots.txt file for a WordPress Blog

User-agent: *
# disallow all files in these WordPress directories
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-

# disallow all files in these directories
Disallow: /tag/
Disallow: /cgi-bin/

# disallow robots from parsing individual post feeds and trackbacks
Disallow: /feed/
Disallow: /trackback/
Disallow: */trackback*

# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized* 

# disallow files ending with the following extensions
User-agent: Googlebot 
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.wmv$
Disallow: /*.php*
Disallow: /*.gz$
Allow: /wp-content/uploads/

#disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
 
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow: /*?*
Allow: /*

Also, if you host your images on your Wordpress blog, you will not lose any Google traffic as we are allowing the Goooglebot image access to the entire site even though we blocked certain WordPress directories.

Remember, once you finish creating your robots.txt file, upload it to your site’s root directory. For example, TechCounter’s robots.txt file exists at http://www.techcounter.com/robots.txt. You can use the robots.txt file for your blog too and you can directly copy paste to your wordpress blog.

Also, unfortunately if you are a Blogger user, you cannot add or edit your robots.txt file. My recommendation, move from Blogger to WordPress.

If you liked this article, click here to buy me a coffee!

Popularity: 21% [?]

Share This | Trackback URL | Comments feed for this post


23 Responses to “Increase Search Engine Rankings and prevent Duplicate Content using robots.txt”

  1. 1
    WordPress Security Tips to protect your WordPress Blog | Computer Security Says:

    [...] lists an excellent SEO Optimized robots.txt file for a WordPress blog. Simply copy the robots.txt file below which also includes a Disallow /wp-* which will prevent any [...]

  2. 2
    Ajit Gaddam Says:

    Hey Ryan, I just added the robots.txt to a wordpress security article

  3. 3
    Check Pagerank Says:

    I strongly recommend that you turn the No Follow off in your comment section. I’ll watch Google Webmaster Tools, and if the links don’t show up after a couple of weeks — I won’t go back to that blog again. Another suggestion: you should have a Top Commentator widget installed. Do Follow and Top Commentator will ensure that you have a successful blog with lots of readers!

  4. 4
    Computer Repair Says:

    Good article about increase serp. I will create robots.txt in my wordpress blog. Thanks for information.

  5. 5
    Ujjwol Says:

    will this allow search engine to take sitemap

  6. 6
    home renovation Says:

    I had the same problem a few weeks ago and i fixed it without installing anything. It’s a virus called “go.google”. I tried installing stuff and still didn’t work. The next steps worked for me so I guess this is the best way.

  7. 7
    Coffee Pods Says:

    Thanks this was a good help

  8. 8
    Carmelo Arlington Says:

    Hi just thought i would tell you something.. This is twice now i’ve landed on your blog in the last 3 weeks hunting for totally unrelated things. Spooky or what?

  9. 9
    seo tampa Says:

    Thank you for the tip.

  10. 10
    Jessica Says:

    Hi, thank you for this useful information.

    Best Regards,

    Jessica

  11. 11
    Phil Says:

    Thanks for the information I will add this to my blog.

  12. 12
    Sanalika Hileleri Says:

    Thank you for the tip…

  13. 13
    tires Says:

    Great use of Robots.txt never thought of using it this way!

    Till then,

    Jean

  14. 14
    apartments cannes Says:

    Hi just thought i would tell you something.. It is twice now i’ve landed in your weblog inside final three weeks hunting for entirely unrelated points. Spooky or what?

  15. 15
    Cannes rentals Says:

    Hi just believed i would tell you something.. This really is twice now i’ve landed on your weblog in the last 3 weeks hunting for absolutely unrelated items. Spooky or what?

  16. 16
    pj Says:

    Dude…

    1. Disallow: /wp- is meaningless. It’s blocking the crawling of directories or files called “wp-”, and there are none of those.

    Instead, Disallow: /wp-* will block spiders from any files or directories beginning with “wp-”. The whole first block can be replaced with this one rule.

    2. there should be NO linebreaks in the file. The way you’ve got it written, only the first 4 rules will actually do anything.

    3. You’re allowing duplicate content through your Categories.

    Disallow: /category/*/*

    will allow your category taxonomy to be indexed, but not the (duplicate) content to which it points, which is already being indexed through your is_home() page.

    4. Disallow: /feed/ isn’t doing what you think it is. All you’re blocking is your front page feed, and none of the others.

    Disallow: */feed is what you need.

    5. Without correct canonicals, you’re still in for infinite content penalties if anyone decides to run a link farm to your urls’ numeric suffixes.

  17. 17
    Generators in Gurgaon Says:

    There just believed i would tell you anything.. This really is twice now i’ve landed on your weblog from the last 3 weeks hunting for certainly unrelated products. Spooky or what?

  18. 18
    Zawad Iftikhar Says:

    I am no seo, but one thing for sure, there is nothing like Allow: in robots.txt, only disallow function. It is only used to disallow certain urls, files, and folders from being indexed so ……..

  19. 19
    Street Walkers Says:

    I have a problem with the overall premise of your post but I still think its really informative. I still really like your writing style. Keep up the good work.

  20. 20
    Seo company Kanpur Says:

    Ya!I will try Increase Search Engine Rankings and prevent Duplicate Content using robots.txt.It is a good way…
    Thanks
    Raj………..!

  21. 21
    blog geodezyjny Says:

    Please let me know if you’re looking for a writer for your site. You have some really great posts and I feel I would be a good asset. If you ever want to take some of the load off, I’d love to write some material for your blog in exchange for a link back to mine. Please shoot me an email if interested. Kudos!

  22. 22
    Rachel Blake Says:

    Blogs ou should be reading…

    [...]Here is a Great Blog You Might Find Interesting that we Encourage You[...]……

  23. 23
    Jody Boddorf Says:

    A big thank you for your post.Really thank you! Cool.

Leave a Reply