Create a robots.txt file

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.

A robots.txt is a simple text file that can be created with Notepad. If you are using Wordpress a sample robots.txt file would be:

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

“User-agent: *” means that all the search bots (from Google, Yahoo, MSN and so on) should use those instructions to crawl your website. Unless your website is complex you will not need to set different instructions for different spiders.

“Disallow: /wp-” will make sure that the search engines will not crawl the Wordpress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files.

If you are not using Wordpress just substitute the Disallow lines with files or folders on your website that should not be crawled, for instance:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /any other folder to be excluded/

After you created the robots.txt file just upload it to your root directory and you are done!

Don't want to miss a single tip? Subscribe to our RSS Feed!

68 Responses to “Create a robots.txt file”

  1. Mac Utopia on February 14th, 2007 1:05 pm

    I agree that robots.txt files are important, however I disagree with blocking the images folder. Some sites achieve some pretty good traffic numbers from google image search, blocking this directory will block your images from showing up in these results.

  2. egon on February 14th, 2007 1:33 pm

    Same with Wordpress. If you disallow “/wp-” then it’s not going to index any of your uploads like images since they are in your wp-content folder. I get quite a bit of traffic from Google Image Search.

  3. Daniel on February 14th, 2007 1:55 pm

    I am not sure if the GoogleImage bot tracks down images from the /images/ folder or directly from the posts where the images where inserted.

    In fact if I think the latter option is true, because the content of the page, the keywords and the alt tag play an important role on the image search algorithm.

    In that case Disallowing the image folder should not affect your incoming traffic from image searches.

  4. Dawud Miracle on February 14th, 2007 6:26 pm

    Nice post. Great reminder of how you can easily protect folders and files on your server.

  5. Thilak on February 15th, 2007 3:10 am

    I guess disalowing “wp-” will not affect Google Image bot from crawling your images because it crawls them from the post and not from the directory

  6. Bes Z on February 15th, 2007 8:52 am

    Thilak, unless I am mistaken, wouldn’t the Google Image Bot be following the rules of the robots.txt file regardless of where it starts crawling from?

  7. Daniel on February 15th, 2007 9:48 am

    Bes, if I am not the wrong the Google Image Bot will not need to crawl your image folder at all. It will crawl your pages, and it will index all the images on those pages (i.e. posts).

  8. engtech on February 15th, 2007 1:45 pm

    One other thing to consider blocking is any duplicate content on your side. Wordpress gives you about three thousand ways to access content (/page, /tag, direct links, etc). Blocking some of them might be a good idea.

    But I’m not an SEO and I have no idea what should be blocked.

  9. Ajay on February 19th, 2007 11:00 pm

    Daniel, that is not the case, rules of robots.txt are always followed no matter where the index is followed from.

    Excluding wp- and wp-content will block out indexing of your images.

    You will need to seperately allow the folder you want.

    I suggest using Webmaster Tools to analyze your robots.txt to give you a proper understanding of what is allowed / blocked.

    Also I am not sure if /trackback/ will disallow everything with /trackback/ can you get this confirmed?

  10. Daniel on February 20th, 2007 3:33 am

    Ajay /trackback/ will disallow all the trackback pages, and I also think we should disallow comments, but I am not sure if /comment/ is the right attribute for that.

    Secondly, regarding the indexing of images, it is true that with the /-wp the images on your upload folder will not be indexed, but they do not have to. I still think that Google indexes images through web pages, regardless of whether it crawls the image folder or not. I will research more about it though.

  11. Mr Apache on March 14th, 2007 11:15 am

    Just to let you know there is an example robots.txt for wordpress at http://www.askapache.com/2007/.....tstxt.html

    Its full of nice tips.. like some robots can recognize wildcards like /wp-* but some can’t.

  12. SEO on May 12th, 2007 9:15 am

    Great article mate. I’m still unsure what needs to be excluded though. But there is some stuff that needs to be blocked to avoid duplicate content. Cheers

  13. John T. Pratt on June 8th, 2007 11:48 am

    Thanks for the post, very helpful. Is there any problem with doing it like this:
    Disallow: */feed/
    Disallow: */trackback/

    Also, when you do:
    Disallow: /wp-

    don’t you need a star, like:
    Disallow: wp-*

    or is it better to just do this:
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    Also, now with sitemap inclusion, you should consider updating this to have:
    Sitemap: http://www.site.com/sitemap.xml

    in there for those using the google sitemap plugin.

    thanks again!

  14. Daniel on June 9th, 2007 4:43 am

    John, regarding the first question: not all search bots recognize the * attribute. Some people argue that the Google Bot specifically does not interpret the * as a joker, it just ignores it. That is why I avoid using it. Plus, it should not be necessary to add it before a folder like */feed/.

    Secondly, I am also not sure about the /wp- working for all folders that begin with that. In fact, on my latest robots.txt file I added all the folders that were to be excluded individually like /wp-admin/ and /wp-includes/.

  15. rtemp on June 13th, 2007 6:04 am

    How do I upload the robots.txt to the root directory using WordPress?

  16. John T. Pratt on June 13th, 2007 9:27 am

    you can’t upload robots.txt to the root directory using Wordpress - it doesn’t have an FTP function. The only way it would be possible is if someone associated “robots.txt” as a theme associated file, and you could edit and save it in “theme editor” under the “presentation” tab. You wouldn’t think it would be too difficult to associate a file with a theme by modifying a little code, but presently I don’t know how to do it.

    Otherwise, you have to do it with an FTP program.

  17. Scibiz on June 14th, 2007 9:07 am

    The particual robot.txt file is an important choice from the SEO point of view. Thanks for the original approach to the problem!

    I’m going to check some other txt…

  18. derek on June 19th, 2007 7:20 pm

    I was curious, my blog isnt installed in the root but instead a folder on my side.

    so for instance instead of /wp-admin I would have /blogfolder/wp-admin

    Does that change things, should I set it up differently, could you give me a hand?

  19. Daniel on June 20th, 2007 7:26 am

    If I am not wrong, even if the blog in on a sub-folder, the robots.txt file should still go on the root directory since it is the first thing a search bot will look for.

  20. Derek on June 21st, 2007 5:02 am

    Thanks Daniel I know that the robots.txt file should still go in the root, I just want to know if I have to change the robots file to look inside my new blog installation directory
    like
    disallow: /blogfolder/wp-admin

    for instance

  21. Daniel on June 21st, 2007 5:18 am

    Derek, I see what you are asking now.

    I think both

    disallow: /blogfolder/wp-admin/

    and

    disallow: /wp-admin/

    would work fine.

  22. John Doe on June 26th, 2007 6:37 am

    So many people with so many ideas!!! I like the ideas in general to block robots to avoid duplicate contents. But in my opinion duplicate contents should be avoided at the url level. Dont allow your cms to generate more then one url of the same post.

    I have posted my thought on my blog. Have a look.

  23. SEO Thailand on July 5th, 2007 1:52 am

    Yes Great Post
    But can you give me tips regarding how to use sitemap effectively

  24. Harsha on July 10th, 2007 1:02 am

    I think there is no harm in allowing the Bots to assecc your images folders. It can bring traffic through the Google image search

  25. AskApache on August 9th, 2007 5:29 am

    Its nice that people are beginning to think about controlling robots with robots.txt.. You may want to look at my updated wordpress robots.txt file on AskApache, especially regarding the digg mirror, way back archiver, etc..

    @ http://www.askapache.com/seo/u.....press.html

  26. karim on October 12th, 2007 1:49 am

    thanks for info

  27. Bollywood Actress on February 9th, 2008 3:26 pm

    Although this post didn’t discuss why and how a certain robots.txt file with some certain entries is best, you made sure people should understand what those codes mean. This is something I have never seen on any other site on my quest to find the best robots.txt file for better SEO. Thank you very much for that.

    I hope you would take the pain to come with a robots.txt file and explain why and how it is better, which would really help a lot of people like me. Thanks again.

  28. Jony on February 18th, 2008 4:35 pm

    Thanks for the post, very helpful.

  29. Jony on February 27th, 2008 4:15 pm

    gracias justo lo que necesitaba

  30. Rafi on February 29th, 2008 11:11 am

    I feel robots are much more important to be placed on the root of the site directoy
    it had really inceared my ranking

    http://www.ehotbid.com
    gem and jewelry auction starting from $2

  31. Gr.Zhang on March 16th, 2008 7:55 am

    喜欢,借鉴,学习

  32. Sue Huss on March 25th, 2008 9:12 am

    I’m confused can someone just give me the correct text for a wordpress blog. Will uploading it from notepad work then?

  33. Dough Roller on March 29th, 2008 3:04 am

    Sue, yes you create the robots.txt file in Notepad and upload it to your server. The robots.txt is a simple text file. You can also create it in the text editing function that is probably available within your file manager on the server that hosts your site. As for the text you can check out http://www.doughroller.net/robots.txt to see what I did (I run Wordpress). I took this code from ask apache. Good luck.

  34. BusySphere on May 11th, 2008 3:42 am

    Thanks for the post. While uploading the robots.txt file, it has been kept under “/” and “/public_html” folders. Which is the correct root directory?

  1. Existential Ventures » I was Having Problems with my Robots.txt File
  2. Nicky's blog ℡
  3. Existential Ventures » Blog Archive » I was Having Problems with my Robots.txt File
  4. Creating A WordPress Robots.txt To Improve SEO | Connected Internet
  5. Creating the ultimate WordPress robots.txt file | Twenty Steps
  6. Create A Robots.txt File And Increase Your Search Engine Rankings
  7. fiLi's tech
  8. New life journal » 如何设置WordPress的robots.txt
  9. xenical
  10. SEO Sitemaps Now Autodiscoverable: Easy and Automatic Roadmaps to Your Blog Content | Adventures of stevenlichen
  11. Gesu`, notes from stambugia » Blog Archive » Creare mappa del sito e robots.txt per il proprio blog
  12. Concentrating on robots.txt specifically for Wordpress
  13. Collection of Robots.txt Files
  14. rapidhit.co.uk
  15. SEO Sitemaps Now Autodiscoverable: Easy and Automatic Roadmaps to Your Blog Content | Quentin Brown
  16. Cactuus » Blog Archive » Weekly link love
  17. Blog Setup: 40 Practical Tips
  18. WordPress SEO Techniques to Avoid Duplicate Content | Better Blogging with Michael Martine
  19. Pajama Mommy>>Mommy Blogger Community » Blog Archive » Want PR? Technorati? Visitors?
  20. Keep A Test Site Of Your WordPress Blog
  21. Tuto de l’été n°3: créez un fichier robots.txt pour votre blog Wordpress
  22. Top 10 basic SEO Tips to build high traffic web site
  23. 3G空间 » Blog Archive » 如何设置WordPress的robots.txt
  24. Control Search Engine Robots » ABlogCo.com
  25. Control Search Engine Robots » ABlogCo.com
  26. Creating the ultimate WordPress robots.txt file
  27. 40 de sfaturi despre cum să îţi faci un blog | CNET.ro
  28. 建博40个实用技巧 | 精品博客
  29. Control Search Engine Robots : MarketingVoice.net
  30. 如何设置WordPress的robots.txt | 汪立民的Blog
  31. Create A Robots.txt File And Increase Your Search Engine Rankings | Free Download Blogs:: News and Articals
  32. Check if Your Images Appear on Google Image Search
  33. Control Search Engine Robots : MarketingVoice.net
  34. Anonymous

Got something to say?





Sponsors

Premium WordPress Themes Why I recommend Doreo Hosting Online Invoicing For Freelancers More Traffic for Your Blog Free WordPress Themes Yougler - Portable Mail Forwards

Popular Articles

Recent Articles

Subscribe via E-Mail


Trying to Find a Good Domain?

killerdomainsbook1.jpg