Increase Search Engine Rankings and prevent Duplicate Content using robots.txt
So, how does one Increase Search Engine rankings, bring in increased Search Engine Traffic and also prevent duplicate content on your blog or website using a single robots.txt file?
What is a robots.txt file?
robots.txt protocol also called the Robots Exclusion standard is used by web spiders and other web robots such as Search engine robots, blog robots, splog robots, adsense robot etc and prevents them from accessing all or part of a blog or a website which is otherwise, publicly viewable.
By placing a robots.txt file on your blog or website you are telling the specified robots to ignore and prevent their indexing of the specified files or directories.
Why is the robots.txt important for my blog?
Search Engines penalize duplicate content by placing that particular post or in some instances an entire blog or website in their supplemental index. Also known as Search Engine Hell, if your post goes into the supplemental index of a search engine, it might take almost 6-8 months for it to come out into the regular index and that too after you removed the duplicate content from that particular post.
Okay, so a Robots.txt file might prevent duplicate content. How does it increase my search engine rankings?
Simple, having a clean blog makes search engines think that you do not host spammy content and if you write good content, the robots will keep coming and index your site everyday instead of once every week. Also by preventing duplicate content, you prevent articles from going into the supplemental index which in turn means that people can find that post when they search on Google or any other search engine, which in turn means you get more traffic! Prominent blogger Neil Patel wrote on his Link Building blog that by using a robots.txt file, his web site traffic went up by 11.3%.
What content does one need to prevent the search engine robots on a website from accessing using a robots.txt file?
a. We need to deny the search engine bots and robots from accessing our WordPress folders or if you have a Movable type blog, your movable type folder. For example, in the picture below, you can see that Google is trying to access a file within my WordPress blog folder and throwing out a 403 (not found) error. This is bad

b. Remove comment feeds from the search results. You should never block the robots from accessing comments on your posts, but block the search engines from accessing your comment feeds as this would result in duplicate content.
c. Remove trackback URLs from being indexed as it may cause blank pages to be indexed
d. Block any log related or stats related files.
e. Prevent the wayback Internet Archive from accessing your blog and storing a screenshot of your blog
SEO Optimized Robots.txt file for a WordPress Blog
User-agent: *
# disallow all files in these WordPress directories
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-# disallow all files in these directories
Disallow: /tag/
Disallow: /cgi-bin/# disallow robots from parsing individual post feeds and trackbacks
Disallow: /feed/
Disallow: /trackback/
Disallow: */trackback*# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*# disallow files ending with the following extensions
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.wmv$
Disallow: /*.php*
Disallow: /*.gz$
Allow: /wp-content/uploads/#disallow WayBack archiving site
User-agent: ia_archiver
Disallow: /# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow: /*?*
Allow: /*
Also, if you host your images on your Wordpress blog, you will not lose any Google traffic as we are allowing the Goooglebot image access to the entire site even though we blocked certain WordPress directories.
Remember, once you finish creating your robots.txt file, upload it to your site’s root directory. For example, TechCounter’s robots.txt file exists at http://www.techcounter.com/robots.txt. You can use the robots.txt file for your blog too and you can directly copy paste to your wordpress blog.
Also, unfortunately if you are a Blogger user, you cannot add or edit your robots.txt file. My recommendation, move from Blogger to WordPress.
If you liked this article, click here to buy me a coffee!Popularity: 16% [?]


January 19th, 2008 at 12:17 pm
[...] lists an excellent SEO Optimized robots.txt file for a WordPress blog. Simply copy the robots.txt file below which also includes a Disallow /wp-* which will prevent any [...]
January 19th, 2008 at 2:31 pm
Hey Ryan, I just added the robots.txt to a wordpress security article
June 8th, 2009 at 2:31 pm
I strongly recommend that you turn the No Follow off in your comment section. I’ll watch Google Webmaster Tools, and if the links don’t show up after a couple of weeks — I won’t go back to that blog again. Another suggestion: you should have a Top Commentator widget installed. Do Follow and Top Commentator will ensure that you have a successful blog with lots of readers!
July 7th, 2009 at 6:13 am
Good article about increase serp. I will create robots.txt in my wordpress blog. Thanks for information.
September 10th, 2009 at 1:06 pm
will this allow search engine to take sitemap
December 26th, 2009 at 12:56 am
I had the same problem a few weeks ago and i fixed it without installing anything. It’s a virus called “go.google”. I tried installing stuff and still didn’t work. The next steps worked for me so I guess this is the best way.
January 31st, 2010 at 4:29 pm
Thanks this was a good help