Increase Search Engine Rankings and prevent Duplicate Content using robots.txt
So, how does one Increase Search Engine rankings, bring in increased Search Engine Traffic and also prevent duplicate content on your blog or website using a single robots.txt file?
What is a robots.txt file?
robots.txt protocol also called the Robots Exclusion standard is used by web spiders and other web robots such as Search engine robots, blog robots, splog robots, adsense robot etc and prevents them from accessing all or part of a blog or a website which is otherwise, publicly viewable.
By placing a robots.txt file on your blog or website you are telling the specified robots to ignore and prevent their indexing of the specified files or directories.
Why is the robots.txt important for my blog?
Search Engines penalize duplicate content by placing that particular post or in some instances an entire blog or website in their supplemental index. Also known as Search Engine Hell, if your post goes into the supplemental index of a search engine, it might take almost 6-8 months for it to come out into the regular index and that too after you removed the duplicate content from that particular post.
Okay, so a Robots.txt file might prevent duplicate content. How does it increase my search engine rankings?
Simple, having a clean blog makes search engines think that you do not host spammy content and if you write good content, the robots will keep coming and index your site everyday instead of once every week. Also by preventing duplicate content, you prevent articles from going into the supplemental index which in turn means that people can find that post when they search on Google or any other search engine, which in turn means you get more traffic! Prominent blogger Neil Patel wrote on his Link Building blog that by using a robots.txt file, his web site traffic went up by 11.3%.
What content does one need to prevent the search engine robots on a website from accessing using a robots.txt file?
a. We need to deny the search engine bots and robots from accessing our WordPress folders or if you have a Movable type blog, your movable type folder. For example, in the picture below, you can see that Google is trying to access a file within my WordPress blog folder and throwing out a 403 (not found) error. This is bad

b. Remove comment feeds from the search results. You should never block the robots from accessing comments on your posts, but block the search engines from accessing your comment feeds as this would result in duplicate content.
c. Remove trackback URLs from being indexed as it may cause blank pages to be indexed
d. Block any log related or stats related files.
e. Prevent the wayback Internet Archive from accessing your blog and storing a screenshot of your blog
SEO Optimized Robots.txt file for a WordPress Blog
User-agent: *
# disallow all files in these WordPress directories
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-# disallow all files in these directories
Disallow: /tag/
Disallow: /cgi-bin/# disallow robots from parsing individual post feeds and trackbacks
Disallow: /feed/
Disallow: /trackback/
Disallow: */trackback*# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*# disallow files ending with the following extensions
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.wmv$
Disallow: /*.php*
Disallow: /*.gz$
Allow: /wp-content/uploads/#disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow: /*?*
Allow: /*
Also, if you host your images on your Wordpress blog, you will not lose any Google traffic as we are allowing the Goooglebot image access to the entire site even though we blocked certain WordPress directories.
Remember, once you finish creating your robots.txt file, upload it to your site’s root directory. For example, TechCounter’s robots.txt file exists at http://www.techcounter.com/robots.txt. You can use the robots.txt file for your blog too and you can directly copy paste to your wordpress blog.
Also, unfortunately if you are a Blogger user, you cannot add or edit your robots.txt file. My recommendation, move from Blogger to WordPress.
If you liked this article, click here to buy me a coffee!Popularity: 21% [?]


January 19th, 2008 at 12:17 pm
[...] lists an excellent SEO Optimized robots.txt file for a WordPress blog. Simply copy the robots.txt file below which also includes a Disallow /wp-* which will prevent any [...]
January 19th, 2008 at 2:31 pm
Hey Ryan, I just added the robots.txt to a wordpress security article
June 8th, 2009 at 2:31 pm
I strongly recommend that you turn the No Follow off in your comment section. I’ll watch Google Webmaster Tools, and if the links don’t show up after a couple of weeks — I won’t go back to that blog again. Another suggestion: you should have a Top Commentator widget installed. Do Follow and Top Commentator will ensure that you have a successful blog with lots of readers!
July 7th, 2009 at 6:13 am
Good article about increase serp. I will create robots.txt in my wordpress blog. Thanks for information.
September 10th, 2009 at 1:06 pm
will this allow search engine to take sitemap
December 26th, 2009 at 12:56 am
I had the same problem a few weeks ago and i fixed it without installing anything. It’s a virus called “go.google”. I tried installing stuff and still didn’t work. The next steps worked for me so I guess this is the best way.
January 31st, 2010 at 4:29 pm
Thanks this was a good help
February 12th, 2010 at 9:41 pm
Hi just thought i would tell you something.. This is twice now i’ve landed on your blog in the last 3 weeks hunting for totally unrelated things. Spooky or what?
April 3rd, 2010 at 3:42 am
Thank you for the tip.
May 21st, 2010 at 5:13 am
Hi, thank you for this useful information.
Best Regards,
Jessica
June 16th, 2010 at 9:37 pm
Thanks for the information I will add this to my blog.
July 12th, 2010 at 5:45 am
Thank you for the tip…
August 5th, 2010 at 3:00 pm
Great use of Robots.txt never thought of using it this way!
Till then,
Jean
August 8th, 2010 at 11:26 am
Hi just thought i would tell you something.. It is twice now i’ve landed in your weblog inside final three weeks hunting for entirely unrelated points. Spooky or what?
August 8th, 2010 at 11:27 am
Hi just believed i would tell you something.. This really is twice now i’ve landed on your weblog in the last 3 weeks hunting for absolutely unrelated items. Spooky or what?
September 3rd, 2010 at 8:32 am
Dude…
1. Disallow: /wp- is meaningless. It’s blocking the crawling of directories or files called “wp-”, and there are none of those.
Instead, Disallow: /wp-* will block spiders from any files or directories beginning with “wp-”. The whole first block can be replaced with this one rule.
2. there should be NO linebreaks in the file. The way you’ve got it written, only the first 4 rules will actually do anything.
3. You’re allowing duplicate content through your Categories.
Disallow: /category/*/*
will allow your category taxonomy to be indexed, but not the (duplicate) content to which it points, which is already being indexed through your is_home() page.
4. Disallow: /feed/ isn’t doing what you think it is. All you’re blocking is your front page feed, and none of the others.
Disallow: */feed is what you need.
5. Without correct canonicals, you’re still in for infinite content penalties if anyone decides to run a link farm to your urls’ numeric suffixes.
September 3rd, 2010 at 11:53 am
There just believed i would tell you anything.. This really is twice now i’ve landed on your weblog from the last 3 weeks hunting for certainly unrelated products. Spooky or what?
October 22nd, 2010 at 5:05 am
I am no seo, but one thing for sure, there is nothing like Allow: in robots.txt, only disallow function. It is only used to disallow certain urls, files, and folders from being indexed so ……..
January 25th, 2011 at 2:51 pm
I have a problem with the overall premise of your post but I still think its really informative. I still really like your writing style. Keep up the good work.
February 1st, 2011 at 1:01 pm
Ya!I will try Increase Search Engine Rankings and prevent Duplicate Content using robots.txt.It is a good way…
Thanks
Raj………..!
November 29th, 2011 at 3:52 pm
Please let me know if you’re looking for a writer for your site. You have some really great posts and I feel I would be a good asset. If you ever want to take some of the load off, I’d love to write some material for your blog in exchange for a link back to mine. Please shoot me an email if interested. Kudos!
December 6th, 2011 at 7:53 am
Blogs ou should be reading…
[...]Here is a Great Blog You Might Find Interesting that we Encourage You[...]……
January 26th, 2012 at 5:21 am
A big thank you for your post.Really thank you! Cool.