Robots txt

Robots Text File (robots.txt)

It is always good practice to create a robots.txt file and place it in your root directory. It is correctly known as the robots exclusion protocol, but we'll stick to calling it robots txt, (Just because I'm too lazy to keep writing robots exclusion protocol throughout this article!)

There are many conflicting views regarding robots text files, (robots.txt), and whether a robots file should be added if it has no content, but following experimentation, on various websites, we are convinced that every website should feature a robots txt file, even if it is an "empty" file .

The robots.txt file is the first thing that a search engine looks for in your root directory, as it indicates which files you do not want the robot to crawl and which files you do not allow to be indexed in the search engines.

Blank robots.txt file

If a search engine spider doesn't find a robots.txt file, it assumes that it can spider the whole site. So why bother putting a blank file into the root directory at all?

The answer is twofold. First of all, it is simply good practice. If the robot is going to look for the file, for the one minute that it takes to create it, you may as well do it.

Secondly, If the robot finds a blank robots.txt file, it knows that you have nothing to hide in you site. The spider knows it has your absolute permission to look at and index every page on your site, because you have left him the message with the robots.txt file. The spider knows that you are not trying to hide or cloak any pages and in our experience, that is seen by certain robots, as a good thing.

By a "blank" robots text file, do you mean "blank"?

It used to be the case that you could create a simple robots.txt file with nothing in it and put it into your root directory. However, to ensure that robots exclusion protocol validates correctly, take ten seconds more to add the text to let the spider know that all robots are welcome and no pages are barred from being spidered.

Create a robots txt file (REP file)

To do this, simply open a page in notepad (by right clicking on your desktop, go to "new" and then "text document") and type in the following:

User-agent: *
Disallow:

The user-agent specifies the robot. You can have a user-agent line for each robot if you wish, but the wildcard symbol "*" lets the spider know that all robots are welcome. When you have completed this task save the file as robots.txt and upload it to your root directory. It's a s simple as that!

The disallow directive lets the robots know the files or directories that you do not wish to be spidered. By leaving the directive blank, in the robots.txt file, the spiders know that they can retrieve all of your website files and pages.

Preventing Robots from Spidering Files using robots txt

If you wish to stop all robots from spidering any of your files, (if they are under construction, for example), you simply create your robots.txt file and type the following:

User-agent: *
Disallow: /

If you want to prevent a specific robot from visiting a page of your site, place the following into your robots.txt file:

User-agent: googlebot
Disallow: /commonplace.htm

Or if it was a file, rather than just a page:

User-agent: googlebot
Disallow: /images/

If you are using your robots.txt file to block particular bot bot from reading a single file, as in the commonplace.htm example above, it's good practice to put that at the top og your robots.txt file and then a space and then the allow all information as below:

User-agent: googlebot
Disallow: /commonplace.htm

User-agent: *
Disallow:

For a better idea, visit the BBC robots.txt file. Once you have created your robots.txt file you can validate it here , to ensure that you have made no errors in creating or uploading your REP file.

If you have any questions regarding this article on REP, please do not hesitate to contact us, we are always happy to offer any help or advice that we can.

Web Design and SEO ArticlesWeb Design and SEO Articles

Most of the articles we write nowadays are featured in the Big Man's Blog, as it is optimised so that a properly optimsed article is featured on the front pages of Google within minutes. The article featured below was originally written a few years back, but the information in the article is still accurate in today's marketplace.

If you have any questions regarding this, or any other article featured on the site, don't hesitate to email your question to bigman@kenkai.com.