Thanks to Ian Lurie and Search Engine Land - http://searchengineland.com/8-canonicalization-best-practices-in-plain-english-44475
Ian Lurie on June 17, 2010 at 1:14 pm
Canonicalization sounds like a process for recognizing sainthood, or maybe a training course in aiming large projectile weapons. But it’s actually one of the most important aspects of organic SEO. Good canonicalization means search engines crawl more pages of your site; it means that link authority and PageRank get consolidated, so you have a stronger link profile; and it means fewer broken links from other sites. Bad canonicalization gets you all that stuff, but with the opposite effect.
Canonicalization defined
The Ian-Lurie-mangles-the-meaning-so-computer-geeks-cringe-definition of canonicalization is: “every resource on your web site has a single web address.”
Every resource means every page, every image, every video, etc..
Single web address means there’s only one Uniform Resource Locator (URL) for each page of content, image, video, etc..
A URL looks like this:
http://www.mysite.com/
Or, it could be: http://www.mysite.com/blah/foo.html.
Or, it could be: http://www.mysite.com/blah/foo.php?meh=123.
Or… Oh, you get the idea.
Note that I said ‘page of content’. That means that a single article, product description or list of articles should appear at a single URL. You should never have multiple URLs for, say, one product description, or one article.
Some of the absurdly bloated content management systems and e-commerce suites out there make canonicalization a challenge. But it’s worth it.
Consequences of bad canonicalization
Here’s an example of ‘bad’ canonicalization: Let’s say I’ve opened a games store: Ian’s Nerdvana (I owe Dave Barry for the term ‘nerdvana’). My store’s home page lives at:
http://www.iansnerdvana.com/
But it also lives at
http://iansnerdvana.com/
and
http://www.iansnerdvana.com/index.html
So what? People will find the home page at all three versions. They won’t know the difference, right? Well, yeah. But search engines will. Googlebot sees the three above URLs as three different pages on the web. That has two consequences that hurt SEO.
First, you lose link authority. If blogger 1 comes to ‘www.iansnerdvana.com’ and links to that page, blogger 2 lands on ‘iansnerdvana.com’ and links to that URL, and blogger 3 lands on ‘www.iansnerdvana.com/index.html’ and links to that page, Googlebot sees three links to three different pages, and applies 1 ‘vote’ to each one. These three links could have sent three authoritative signals to Googlebot for my site’s home page. Instead, they’re split into three weaker individual votes for three different pages. It’s as if Ross Perot or Ralph Nader were sitting in front of my site, siphoning off votes. It’s link love mayhem.
If I weren’t such a loser, I would’ve set up my site so that my home page ‘lived’ at one unique URL – ‘www.iansnerdvana.com’. Then all 3 bloggers would have linked to that page, and Googlebot would instead apply all three votes to a single page. If I care about link authority – and who doesn’t, I ask you? – then that’s a far better outcome.
Secondly, search engines won’t crawl your site as deeply as they might. Search engines allocate resources for each crawl. No one knows exactly how, but it’s safe to say Googlebot won’t just wander around your site until its found every page. At some point, it gives up and leaves. If multiple pages on my site have multiple URLs, then visiting search bots waste time tracking down all of those different versions. That’s time they could spend crawling other unique pages, instead. So fewer unique pages of my site end up in the search index, and I have fewer chances to rank.
Don’t feel bad, though. Even SEO agencies screw it up.
Best practices
You can avoid the heartbreak of bad canonicalization, or at least minimize it, by doing a few simple things:
What about rel=canonical?
The canonical tag is a neat little gadget that’s supposed to let you tell search engines the correct URL for any page. So, by adding to any page, I could tell visiting search bots to index just that version, and to direct all link authority to that one URL. It sounds ideal.
It’s not. First, Yahoo! and Bing don’t yet have confirmed support for it. Second, you can’t rely on tags of this nature, as search engines may change their minds later. Google’s done it. So don’t stake your SEO strategy on it. Third, why not do it right the first time? In addition to SEO benefits, a canonically clean site should run faster, present fewer maintenance headaches and place less load on server and bandwidth resources.
Let’s get canonical!
So, get out there and start cleaning up your site. Canonicalization fixes are generally simple, have a broad impact and let you fix multiple SEO problems at once. You’ll get more link authority, deeper site crawls and better rankings. What’s not to love