I particulary enjoy exploring interesting search results in search of inspiration for new blog posts. Today we take a look at error pages indexed in Google, and discuss why having such pages indexed in search engines isn’t great. Finally, we wrap up with a look at tools that can tell you everything you need to know to detect and remedy the problem.
Photo by: Clair Neal
What’s wrong with this result?
Let’s start by running a few queries in Google. We’re looking for pages that may appear to announce that a page cannot be found, with a 404 message. Queries such as allintitle:this page cannot be found and allintitle:this page cannot be found 404 reveal exactly what we’re hunting for:
Why would an error page display in Google’s search results?
Don’t be fooled by the error page messages and announcements warning of 404s. The content of the pages may say they’re errors, but in fact they’re being treated like every day ordinary web pages by search engines. A 200 status code contained in the server header response produced by the host web server at the request URI is the culprit. The 200 response (meaning “OK” and “The request has succeeded.”) fools a search engine into including the page in its index, rather than excluding the page as you’d expect if the correct response, (a 404) was issued.
The problem, and how to diagnose its cause
Let’s take a look at a few examples and use some tools to see what’s happening. Kew Gardens have a great case study, as their custom error page doesn’t behave in the way you’d hope, although the web server hosting the site can (and does) produce a 404 when pushed.
Firstly, and if you don’t already have it, go and get Live http headers for Firefox. Live http headers is one of my “must have” SEO tools for Firefox and makes it possible for you to see what’s contained in the server header response, the place we need to check the status code of the URI requested. You could also give SEO Book’s Server Header Checker a try, Rex Swain’s HTTP Viewer, or Http Fox.
Back to our example. Using one of the tools above, check this error page out on the Kew website. Check the server header – you’ll find a 200 status code. Their custom error page is perfectly ok visually, but a secret lurks under the bonnet: the developer hasn’t set up a 404 response correctly.
What’s particularly interesting in this case is that a randomly generated URL produces the correct 404 status code:
Indexed pages like these are certainly not ideal for your SEO efforts, and in extreme cases in large sites with many pages of expired content, there can be more low value indexed blank error pages than actual content. If you’re interested in good housekeeping and getting your fresh new content into search engine indexes quickly and often, you should be interested in crawl bandwidth hogging issues just like this one.
A 404 error Christmas check list…
Here’s a few pointers to check for this problem on your own site:
If you have a custom 404 error page, check the server header response and verify that the status code produced is a 404.
Generate a random URI for a page you know does not exist and check for the 404 status
Does your CMS redirect when it does not recognise a URI? Personally, I believe redirecting to a 404 error internally is not a great signal – try to avoid that if you can.
Use Google Webmaster Tools or crawl your site with Xenu’s Link Sleuth to get a good feel for the state of any detectable errors on your site. Tom wrote a great primer on Xenu at SEOmoz a few weeks back.
You could also try installing the IIS SEO Toolkit, which checks for errors and many, many other issues that might affect your site SEO. That’s it for our Christmas SEO post – back to the beer, food and television!