What Should I Put in My Robots.txt File?
You can make some great optimizations to your robots.txt file if you just knew how.
Let’s go over some advice on this sensitive subject. What is it? How you can benefit from keeping it up to date?
The importance of this file should never be underestimated when it comes to good SEO practice. The file essentially allows you to speak to different search engines and inform them as to what sections of the website should be indexed. This provides specific directions to search bots and can determine a large part of your SEO success if well managed.
Do I really need it?
A common and fair enough question. The absence of a robots.txt file is the thing to consider here. If you don’t have one it won’t actually stop your site from being indexed by search bots. However, you will lose out on a lot of potential SEO power.
I don’t have one. How can I make it?
It’s simple enough. You’ll usually find a robots.txt file hanging out in the root folder of a site. You will want to make sure that you are connected to your site via a cPanel manager or FTP client.
So what should I put in it?
Let’s get down to the nitty gritty and run through some ideas and suggestions as to the content of your file.
We’ll start with robots exclusion protocol tags or REP. When an REP tag is applied to a URL it will steer clear of particular indexer tasks. Each search engine views and interprets REP differently.
Google will wipe out a URL only listing on their SERPs if a single resource is amended with a noindex tag. Bing is different in that it will often list these references on its SERPs.
The point here is that as REP tags can be put in the meta elements of HTML contents in addition to HTTP headers on any website object. The general view is that robot tags will overrule any conflicting directives that are found in meta elements.
If you put an indexer directive as a microformat it will work for you by overruling any page settings for certain elements of HTML.
An example of this is when a page’s robots tag shows “follow”. The rel-nofollow directive of a different element or link will win.
While your robots.txt won’t have indexer directives, it is possible to set these for a group of URLs so that they apply only to the robots tag.
This will require some programming skills from you as well as a sound sense of web servers and HTTP protocol.
You’ll find a similarity between both Bing and Google in that they will both honor regular expressions – two, specifically. You can use these to identify any pages or folders in your site that you want excluded from ranking. These characters are the dollar and the asterisk ($ and *).
The dollar matches the end of the URL and the asterisk acts as a wild card, which will show any sequence of characters.
It’s important to keep in mind that the robots.txt file on your site will be viewable by the public. This means that anyone analyzing your site will be able to see which areas the owner has blocked the search engines from viewing.
This means that you´ll have to apply other techniques if you have private information on your website that you don’t want the public to know about. You might want to use more suitable file protection methods such as passwords to prevent visitors from accessing that information.
Some rules to keep in mind
Meta robots that have the parameters “noindex, follow” are best used as a method to restrict or direct the indexing of your site via search bots and crawlers.
You’ll also want to keep in mind that if a crawler is malicious in nature it very likely won’t refer to your robots file in the first place. This means that you can’t reliably make use of the robots.txt file as any kind of security measure. This is a common pitfall as it isn’t quite shouted from the rooftops in some cases.
You can also only have one disallow: line for every URL that you are using – no extras.
It’s also good to remember that Google and Bing both accept the dollar and asterisk expression characters. This is a key factor in making best use of pattern exclusion.
Make sure of proofreading. The robots text file is case sensitive! Don’t get caught out and find yourself scratching your head for an hour because your finger slipped on the shift key.
Scary things can happen if you don’t maintain your robots file well.
There have been cases of well kept sites with great back links and organic content, but just no SEO luck for unfathomable reasons. Many just suffered because a single disallow forward slash was included, killing their SEO by telling crawler bots not to index any of their pages at all!
Yes and no
A brief summary of what is good and what isn’t when it comes to robots.txt.
- Look at the directories on your site. You’re likely to have some areas that you want to block using the txt file.
- Stop indexing areas of your site with legitimate duplicate content such as a printable page version or a repeated recipe or manual.
- Be sure that the search engines aren’t being blocked from indexing your main site.
- Check around for specific files on your site that might be best blocked. These can include phone numbers or email addresses.
- Use any comments in the txt file. This will cause all kinds of problems as it’s very sensitive.
- Put all the details of your files in the txt file. As we’ve established, this is public and could defeat the point of masking some areas of your site!
- Don’t add “/allow” to the file as it won’t have any effect
There’s a lot of use for robots.txt files. It’s almost always to your benefit to have and maintain one. However, you do need to keep in mind how sensitive it is in nature.
With some proper diligence and a sound understanding of its limitations it can help you guide and direct search bots and crawlers to the specific content you want ranked. You can also use it to enhance privacy by hiding content from the public.
It isn’t the beginning and end of SEO and your problems to be honest. But with careful use it will keep you safe and on track towards that page one prominence.