url canonicalization: how it affects seo and how to fix it

URL Canonicalization is the process where by a resource that has multiple URL or URIs pointing to it is standardized to a single URL. The process does not involve removing or changing the other URLs, but just designating one of the URLs as the “normal” or “standard” URL. The process is also known as normalization or standardization.

For example, your home page might have several URLs that all point to the same web content. Technically, they are all different and unique URLs that can point to different content, but it just happens to the same page in this case.

http://www.example.com
http://example.com
http://www.example.com/
http://home.example.com/
http://www.example.com/index.jsp

You could potentially pick one of these URLs as the best, standard or prefered URL that represents the content.

It is quite likely that the search engines will pick one that it deems to be best. In most cases, it works quite well, especially in simple cases. But sometimes the search engines will choose the “wrong” URL or not the one you wanted to be chosen. When something like that happens you will have the explicitly state the URL that you want to be used.

There are several reasons you should probably specify the canonicalized url explicitly regardless of how search engines interpret URLs.

  • There are several different crawlers, bots and search engines. Some have better algorithm than others and are better at choosing the correct or best url.
  • You could end up with several versions of indexed content under different URLs and thus being mistaken for duplicate content. You could get penalized for duplicate content as well.
  • The ranking, page juice etc or whatever a search engine uses to rank results can get distributed between several URLs, although all of it points to the same content.
  • The “wrong” URL could be displayed in the search engine result pages. Also, different search engines may choose to different URLs for the same content.
  • It is possible that your friendly URL format is not getting displayed, but a long generic URL with parameters is displayed instead.
  • Usage of different URLs by different traffic could make analytics difficult. The same page might get referenced by different urls in the traffic reports leading to misleading statistics. Consolidation of URL metrics is important.

There are several different ways you can fix this issue. The first method is to create a simple server side redirect from the set of URLs to the desired URL. The other method is to include a meta tag in the web page header section that explicitly references the correct URL. We will take a look at these methods below.

HTML Tag Method

Every webpage has a header section. This is the section between head tag at the start of the webpage source. You will usually find the title and meta description tags in this section. You will need to add a canonical tag in this section. You will actually create a link tag with two attributes: rel and href.

rel: The rel attribute can take several different values. You should have already seen values such as stylesheet, author, icon etc. You will use the value canonical for this tag.
href: This attribute will reference the URL that you prefer the page to be indexed by. It does not need to be absolute and you can use relative URLs. Many crawlers can use the base url link and relative URL to resolve URLs. However absolute URLs are strongly recommended in order to avoid errors and misinterpretations.

An example of the canonical link tag would look like this:

<link rel="canonical" href="http://www.lostsaloon.com/marketing/" />

The above tag instructs the crawlers to treat the URL of this web page, as the one specified in the href no matter what URL was used to access the page. Beware that tag is more of hint than a directive which means the crawlers can ignore it or override it.

The advantage of this method is that you don’t have to track down every single URL that might point to the page. It can handle URLs that you might not know of, especially if you use some content management systems such as WordPress or Drupal or some shopping or e-commerce frameworks. It also eliminates the need to come up with rules that will correctly redirect all URL variations.

Another advantage is that there is no “hard” redirects where a browser will need request another URL and to reload the content. This can save on multiple requests to the server when it is not needed.

It is relatively new tag, so it is quite possible that not all search engines and crawlers support it fully. However, the major search engines such as Google and Yahoo do support them.

HTTP Redirect Method

The is similar to standard HTTP redirect that you might have used in other scenarios. URL redirection is the method where one URL is redirected to a different URL forcing the client to load the new URL instead of the original or requested URL.

You can use a server side redirect with the HTTP code 301. The 301 code denotes that the redirection is a permanent redirection. This kind of redirection is a stronger signal when it comes to crawlers because it forces the clients to load and read content from the new URL. Moreover, 301 redirection has cross-domain functionality which is not supported by other methods.

Another advantage is that it redirects all users or clients including browsers, direct traffic and crawlers. This allows you to choose the correct URL for all users including social media shares and bookmarking etc.

You will need to figure out all the URLs that point to each page and create rules that will redirect the pages correctly. It is easy to miss some URLs or create wrong redirects if you are not careful when writing rules.

HTTP Header Method

This is a variation of the HTTP tag method that was described earlier. You can configure the HTTP headers for URLs to specify the correct or prefered URL for the resource. You will have to create a HTTP header that looks something like this:

Link: </marketing/>; rel="canonical"

Just as with the HTTP tag method method, the crawlers will now try to honor the prefered URL.

Use Sitemaps with Prefered URLs

Create a sitemap file for your webpages. In this Sitemap file, ensure that you reference the web pages using the prefered URLs only. This is just another indication to the crawlers and search engine bots which URL they should use.

This method probably not the most effective when compared to other methods. Not all search engines follow the sitemaps and it is also possible that crawlers will land on your pages by following other links from other domains and pages.

 

You can follow one or all of the above methods to make sure that your pages are indexed correctly by search engines. Getting your pages indexed correctly and as you prefer is a big part of SEO to ensure that your pages are displayed correctly in SERP or Search engine result pages. Sometimes, a combination of the above methods work the best depending on the structure of your website.