Today, I want to share a tip for quickly weeding out errors in sitemap XML files. This post will teach your a few new Excel tricks and (hopefully) save you some time in your own technical SEO audit projects.
Why a nice clean sitemap?
Google have invested a lot of time and effort into improving the sitemaps functionality in Webmaster Tools and the advice I’ve always heard from Google people is advice like; keep your sitemaps as error free as you can, use the correct canonical URL. I’ve always felt that a sitemap file with a very low load time is also advisable if you can speed up the dynamic elements of the file generation.
Bing’s Duane Forrester was on a Whiteboard Friday talking about exactly this topic back in March last year. His comments about sitemap cleanliness were of particular interest:
We have a very tight threshold on how clean your sitemap needs to be. When people are learning about how to build sitemaps, it’s really critical that they understand that this isn’t something that you do once and forget about. This is an ongoing maintenance item, and it has a big impact on how Bing views your website. What we want is end state URLs and we want hyper-clean. We want only a couple of percentage points of error.
Duane Forrester
If you’ve not seen the video, I recommend taking a few moments to review what’s said:
Checking for problems during a site audit
When I’m working on a site, I sometimes need to work out what state the XML sitemap is in. If I know there’s been a recent update to the file(s) it’s not always a good idea to totally rely on the data coming from Webmaster tools. If you’re ever in that situation, here’s how to get a fresher impression of the state of your sitemap.xml file.
Import your XML sitemap file into Excel
Firstly, head to Data > From Other Sources, and select the “From XML Data Import”:
In the “File name” dialogue, type or paste in your sitemap XML file URL:
Click “Yes” when this dialogue appears:
And eventually, your entire XML sitemap appears, all nicely formatted in Excel:
Fetch the HTTP status code for each URL
Back in the old days, the quickest way to check the server header response of a list of URLs was to crawl the list with XENU on 0 crawl depth, and then VLOOKUP against the original list. Thanks to Niels Bosma and his SEO Tools for Excel, this process is much, much quicker.
Just create a new column and insert the following query into the first cell:
=HtmlStatus([@[ns1:loc]])
Check that the canonical in the page header matches the declared URL in the sitemap
Aside from a rarely updated sitemap, the other common mistake i encounter is non-canonical URL submission in the sitemap file. The URL in the sitemap and the URL described in rel=”canonical” don’t match!
=HtmlCanonical([@[ns1:loc]])
Check out the “Match?” column – we’re checking to see if the canonical declared in the web page’s header matches the actual URL in the sitemap. Oh my, it’s a simple formula:
=EXACT([@[ns1:loc]],[@Canonical])
Learn this stuff
Learn this stuff! If you’re not familiar with some of the tips in the post – that’s ok, I more or less have to relearn the more complicated stuff every time I start a new task with Excel. It’s enormously satisfying to make the time to teach yourself new tricks and Excel is brilliantly simple once you have the gist of it. Here are some resources to get you started:
Using Tables in Excel
How to Use VLOOKUP
How to Make a Pivot Table in Excel
How to Find Orphaned Pages in Excel
How to Extract Your Competitor’s Keyword Strategy
HTTP Status Reports with Xenu, OSE and VLOOKUP
What are My Most Linked to Subfolders
Categorise Your Links with Excel
Keyword Research wiith Categories