Diễn Đàn SEO Panda - SEO Panda Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Diễn Đàn SEO Panda - SEO Panda Forum

Diễn Đàn SEO Panda Dành Cho Các SEOers Tự Do Thảo Luận SEO - SEO Panda Forum - Free SEO Forum to share your knowledge to the world
HomeLatest imagesSearchRegisterLog in

Display results as :
Rechercher Advanced Search
Latest topics


 How to Use Wildcards in Robots.txt

Go down 

Posts : 1016
Points : 22855
Join date : 2012-03-27

How to Use Wildcards in Robots.txt Empty
PostSubject: How to Use Wildcards in Robots.txt   How to Use Wildcards in Robots.txt EmptyFri Apr 20, 2012 10:02 pm

There’s been quite a large reaction to Google’s announcement that you don’t have to rewrite your URLs to appear as simple “static looking” and keyword rich as once advised. I’m not going to change a thing about the SEO on my sites, but what if you already have a site that uses dynamic query parameters, and it’s index at Google is a bit of a mess? Let’s use this website as an example.

I still truly believe the best way to handle a dynamic site is to rewrite and avoid any form of query parameter in the visible URL. But what if you don’t have the time, resource or inclination to rework a core component of your site? Here’s another, viable solution: Wildcard out all the unwanted query strings, submit your “canonical” urls via a sitemap.xml, and make sure your internal linking structure mirrors the canonical urls you referenced in the sitemap file correctly.

Let’s do a quick duplicate content check on the site:

The core website has approximately 50 pages of content, however, Google’s got more than 300:


The main cause of the problem is a little query parameter called itemid – for perfect accuracy, let’s use title case as that’s how the query parameter has been indexed: Itemid

How do we get rid of all of those duplicated pages? First, let’s update the robots.txt to include a wildcard that blocks all urls containing the offending string (please don’t forget to sort out your internal linking while you’re doing all of this!)

Add the following line to your robots.txt:

Disallow: /*&Itemid

Make sure you test that this will work in Google’s webmaster tools like this:

*Robots.txt is case sensitive! Remember this when you’re testing your file.

Finally, create a sitemap.xml file and submit that via your webmaster tools account and / or reference it in your robots.txt file like this:

Sitemap: http://www.sortoutstress.co.uk/sitemap.xml

Some SEO’s don’t agree with the use of wildcards, or at least they see it as slightly risky or far from best practise. I totally agree, they’re not ideal. You have to be very careful and give a huge amount of attention to testing. Problem is, we can’t always have our cake and eat it. Sometimes getting a fully rewritten site is out of reach in this years budget, or it’s beyond the scope of the site build. Whatever the reason, if you can’t canonicalise the old fashioned way, why not give this technique some serious consideration?
Back to top Go down
How to Use Wildcards in Robots.txt
Back to top 
Page 1 of 1

Permissions in this forum:You cannot reply to topics in this forum
Diễn Đàn SEO Panda - SEO Panda Forum :: Search Engine Optimization :: General SEO-
Jump to: