Diễn Đàn SEO Panda - SEO Panda Forum

Diễn Đàn SEO Panda Dành Cho Các SEOers Tự Do Thảo Luận SEO - SEO Panda Forum - Free SEO Forum to share your knowledge to the world
 
HomeCalendarFAQSearchMemberlistUsergroupsRegisterLog in
Search
 
 

Display results as :
 
Rechercher Advanced Search
Latest topics

Share | 
 

 How to Use Wildcards in Robots.txt

View previous topic View next topic Go down 
AuthorMessage
khiemsound



Posts : 1016
Points : 11645
Join date : 2012-03-27

PostSubject: How to Use Wildcards in Robots.txt   Fri Apr 20, 2012 10:02 pm

There’s been quite a large reaction to Google’s announcement that you don’t have to rewrite your URLs to appear as simple “static looking” and keyword rich as once advised. I’m not going to change a thing about the SEO on my sites, but what if you already have a site that uses dynamic query parameters, and it’s index at Google is a bit of a mess? Let’s use this website as an example.

I still truly believe the best way to handle a dynamic site is to rewrite and avoid any form of query parameter in the visible URL. But what if you don’t have the time, resource or inclination to rework a core component of your site? Here’s another, viable solution: Wildcard out all the unwanted query strings, submit your “canonical” urls via a sitemap.xml, and make sure your internal linking structure mirrors the canonical urls you referenced in the sitemap file correctly.

Let’s do a quick duplicate content check on the site:

The core website has approximately 50 pages of content, however, Google’s got more than 300:

http://www.google.co.uk/search?hl=en&safe=off&rlz=1B3GGGL_enGB257GB258&q=site%3Ahttp%3A%2F%2Fwww.sortoutstress.co.uk%2F&btnG=Search&meta=

The main cause of the problem is a little query parameter called itemid – for perfect accuracy, let’s use title case as that’s how the query parameter has been indexed: Itemid

How do we get rid of all of those duplicated pages? First, let’s update the robots.txt to include a wildcard that blocks all urls containing the offending string (please don’t forget to sort out your internal linking while you’re doing all of this!)

Add the following line to your robots.txt:

Disallow: /*&Itemid

Make sure you test that this will work in Google’s webmaster tools like this:



*Robots.txt is case sensitive! Remember this when you’re testing your file.

Finally, create a sitemap.xml file and submit that via your webmaster tools account and / or reference it in your robots.txt file like this:

Sitemap: http://www.sortoutstress.co.uk/sitemap.xml

Some SEO’s don’t agree with the use of wildcards, or at least they see it as slightly risky or far from best practise. I totally agree, they’re not ideal. You have to be very careful and give a huge amount of attention to testing. Problem is, we can’t always have our cake and eat it. Sometimes getting a fully rewritten site is out of reach in this years budget, or it’s beyond the scope of the site build. Whatever the reason, if you can’t canonicalise the old fashioned way, why not give this technique some serious consideration?
Back to top Go down
View user profile
 
How to Use Wildcards in Robots.txt
View previous topic View next topic Back to top 
Page 1 of 1

Permissions in this forum:You cannot reply to topics in this forum
Diễn Đàn SEO Panda - SEO Panda Forum :: Search Engine Optimization :: General SEO-
Jump to: