In a recent Google Search Central video, Gary Illyes, a prominent figure at Google, shed light on how Google chooses canonical web pages and why duplicate pages can actually be beneficial for SEO.

In this post, we’re going to discuss all that and more so you can understand how canonical web pages are chosen by Google and why duplicate pages aren’t always bad.

More specifically, this blog post dives into: 

  • Illyes' insights, 
  • Exploring the concept of canonicals, 
  • The various reasons behind duplicate pages on websites, and 
  • The signals Google uses for selection.

Understanding canonical pages

The term "canonical" can have different meanings depending on your perspective. 

Publishers typically view it as the "original" web page, while SEOs focus on the "strongest" version for ranking purposes. However, Google's approach to canonicals is quite different.

According to Google's official documentation, "canonicalization" refers to the process of selecting a preferred version among duplicate pages. This is often done to avoid duplicate content issues that can dilute ranking power.

5 common reasons why a website may have duplicate pages


There are several reasons why websites might have duplicate pages, as Illyes highlights:

  • Regional Variants: Content targeted at specific regions, for example, a web page showcasing services for the USA and another for the UK, both in English but with regional variations.
  • Device Variants: Separate versions of a page optimized for mobile and desktop browsing experiences.
  • Protocol Variants: Both HTTP and HTTPS versions of a website.
  • Site Functions: Pages generated through sorting or filtering functions on a category page.
  • Accidental Variants: Unintended duplicate content, such as a demo version of the site mistakenly accessible to search engines.

We should highlight here that a website should try its best to keep duplicate contents at its minimum. While there are valid exceptional cases where duplicates might be okay, in most other cases a single canonical source is often better.

For instance, normally a website should not have two variations for HTTP and HTTPS versions of a website. In that case, it is recommended to redirect the HTTP version to the HTTPS version.

Now that we have made that disclaimer, let’s move on.

Signals for choosing canonicals

Illyes emphasizes that Google uses a set of signals to determine the canonical page from a group of duplicates.

"Google determines if the page is a duplicate of another already known page and which version should be kept in the index, the canonical version," explains Illyes."But in this context, the canonical version is the page from a group of duplicate pages that best represents the group according to the signals we've collected about each version."

Here, Illyes introduces the concept of "duplicate clustering." 

Google groups similar content together, and then analyzes various signals to select the most representative page within the cluster as the canonical. These signals, according to Illyes, are pieces of information Google gathers about web pages to understand their relevance and importance.

Some signals are readily controllable by the website owner, such as the rel=canonical link attribute, which hints at Google's preference for a specific version.

Other signals, like the overall importance of a page within the web ecosystem, are beyond the publisher's control.

Alternate versions of web pages

Interestingly, Illyes highlights the concept of "alternate versions" within a duplicate cluster. Google might choose an alternate version for search results when it's a better fit for a specific user query. This is particularly relevant for local business websites or eCommerce stores.

Imagine an e-commerce website for a clothing store with a product page for a specific shirt. The page might have variations based on size or color. These variations, while technically duplicates, could be chosen as alternate versions if a user searches for a specific color combination.

This is crucial for local businesses and eCommerce stores because redirecting all variations (noindexing them) to avoid keyword cannibalization might be counterproductive. Certain variations could be the most relevant results for long-tail search queries containing specific details like size or color.

The nuance of centerpiece content and future implications

In Gary Illyes' discussion, he interestingly refers to the main content of a web page as the "centerpiece." This terminology aligns with the concept of "Centerpiece Annotation" introduced by another Google representative, Martin Splitt. 

While the exact details of Centerpiece Annotation remain unclear, Illyes' reference suggests it might be related to how Google identifies and prioritizes the core content on a web page.

This concept holds significance for content creators because it emphasizes the importance of crafting high-quality, informative content that serves as the heart of a web page. Focusing on in-depth content that directly addresses user queries can potentially improve a web page's ranking potential, even amidst variations and alternate versions.

Key takeaways from this discussion on canonicals

takeawys-notes.webp

  • Google's concept of canonical pages differs from publisher and SEO perspectives. Publishers see it as the "original" page, while SEOs focus on the "strongest" version for ranking. Google prioritizes the version within a duplicate cluster that best represents the group based on various signals.
  • Duplicate pages can arise for various reasons relevant to local businesses and eCommerce websites. This includes regional variations for different locations, mobile and desktop versions, and variations due to site functionalities like sorting/filtering.
  • Google uses a set of signals to choose the canonical page within a cluster of duplicates. Some signals are controllable by the website owner, like the rel=canonical link attribute. Others, like the overall importance of a page, are beyond the publisher's control.
  • Alternate versions of a page can still rank and be beneficial for SEO. This is particularly relevant for local businesses and eCommerce stores with variations catering to specific user queries. For example, variations based on size or color on a product page might be chosen by Google for relevant searches.

Moving forward: Best practices for higher SERP rankings

Website owners and publishers can leverage the insights from Illyes' discussion to refine their SEO strategies. Here are some key takeaways to consider:

  • Prioritize high-quality content: Focus on creating valuable, informative content that caters to the specific needs and interests of your local audience. This core content should serve as the centerpiece of your web pages.
  • Embrace structured data: Utilize structured data markup to provide Google with additional context about your website and businesses, including location information, contact details, and offerings. This can enhance your search visibility.
  • Optimize for mobile: Ensure your website is mobile-friendly, delivering a seamless browsing experience on smartphones and tablets. This is especially crucial for local SEO as many users search for businesses on the go.
  • Monitor and Refine: Regularly monitor your website's performance in local search results. Analyze user behavior and search queries to identify areas for improvement.

Conclusion

If you have any questions or comments, let us know. 

And don’t forget to check out our SEO Toolbox to evaluate how your website is performing and whether or not you have any duplicate pages that you should try fixing.

As a reminder, you can:

  • Merge and redirect pages to remove duplicates
  • Delete one version
  • Use rel=canonical for pages that you think should be prioritized by Google.

But as we have just learned, Google wouldn’t always consider rel=canonical, and not all duplicate pages are considered harmful.