There’s been quite a large reaction to Google’s announcement that you don’t have to rewrite your URLs to appear as simple “static looking” and keyword rich as once advised. I’m not going to change a thing about the SEO on my sites, but what if you already have a site that uses dynamic query parameters, and it’s index at Google is a bit of a mess? Let’s use this website as an example.
I still truly believe the best way to handle a dynamic site is to rewrite and avoid any form of query parameter in the visible URL. But what if you don’t have the time, resource or inclination to rework a core component of your site? Here’s another, viable solution: Wildcard out all the unwanted query strings, submit your “canonical” urls via a sitemap.xml, and make sure your internal linking structure mirrors the canonical urls you referenced in the sitemap file correctly.
Let’s do a quick duplicate content check on the site:
The core website has approximately 50 pages of content, however, Google’s got more than 300:
http://www.google.co.uk/search?hl=en&safe=off&rlz=1B3GGGL_enGB257GB258&q=site%3Ahttp%3A%2F%2Fwww.sortoutstress.co.uk%2F&btnG=Search&meta=The main cause of the problem is a little query parameter called itemid – for perfect accuracy, let’s use title case as that’s how the query parameter has been indexed: Itemid
How do we get rid of all of those duplicated pages? First, let’s update the robots.txt to include a wildcard that blocks all urls containing the offending string (please don’t forget to sort out your internal linking while you’re doing all of this!)
Add the following line to your robots.txt:
Disallow: /*&Itemid
Make sure you test that this will work in Google’s webmaster tools like this:
*Robots.txt is case sensitive! Remember this when you’re testing your file.
Finally, create a sitemap.xml file and submit that via your webmaster tools account and / or reference it in your robots.txt file like this:
Sitemap:
http://www.sortoutstress.co.uk/sitemap.xmlSome SEO’s don’t agree with the use of wildcards, or at least they see it as slightly risky or far from best practise. I totally agree, they’re not ideal. You have to be very careful and give a huge amount of attention to testing. Problem is, we can’t always have our cake and eat it. Sometimes getting a fully rewritten site is out of reach in this years budget, or it’s beyond the scope of the site build. Whatever the reason, if you can’t canonicalise the old fashioned way, why not give this technique some serious consideration?