What Duplicate Content Is and How It Ruins Your Rankings

What Duplicate Content Is and How It Ruins Your Rankings

Duplicate Content Affects Rankings!

Duplicate content is the same or partly the same content on a few URLs. Such pages can lead to a wasting crawl budget, and significant rankings drop as a result. You might think that it’s definitely not about your site but don’t jump to conclusions.

In this article, I’ll tell you where duplicates are hidden, and how to identify and fix them giving the spider to notice the more important pages.

Where Duplicate Content is Hidden

There are a huge number of reasons why duplicates appear. We’ll list only the most common causes:

Small differences like www” & non “www” version

Your site can be accessed both at www and non-www versions or HTTP and HTTPS.  It can also be two versions both with and without the slash at the end of the URL. As a result – you have two identical websites with duplicates of all the pages it has.

Filters and sorting

If you use a filter on the site, the results will be formed on a separate page with the dynamic URL. It means that the combination of different filters and sorting parameters creates numerous automatically generated pages. Such elements usually cause duplicate creation.

Filter and Sorting

Pagination

Pagination also creates a duplicate issue as titles and descriptions of all the pages are the same. Read about how to do it correctly at the end of the article.

Pagination

How Duplicates Affect Rankings

Wasting of a Crawl Budget

The crawl budget is the number of URLs Googlebot can crawl over a certain period. We know that the crawl budget has a limit. You can either just put up with it, or you can try to make it grow. One of the ways – is to delete or hide all the duplicate pages from your site and let the spider index important pages instead of unuseful duplicates.

Reducing of Visibility

Since search engines want to provide the most relevant information, they’re trying not to show the same pages as the result. So, the engine probably will choose one of your duplicates, and this way visibility of each of the duplicates can be lowered.

Backlinks Division

If the same article is available on two different URLs, all backlinks and shares will be divided between these two articles as some readers are linking to the first URL and others ‒ to the second one. It means that the rankings of two pages will be lower.

How to Identify Duplicate Content

Since we’ve found out how it affects the ranking, let’s identify all the duplicates on your site so you can fix or hide them from the search engine.

First of all, you can do it manually or you can use the tools. It depends on the size of your site and the number of such a page. Small issues can be fixed in a few minutes my hand but if you’re not sure, don’t waste your time and use special instruments.

Manually

So, if your website is quite small, you can do the following operation to find the duplicates.

Use site:yourwebsite.com to get the list of all your site pages indexed by Google.

Identify Duplicate Content

After that, you can manually check the results. And again, it’s no sense to use this approach with huge platforms. It better fits when your site is already optimized so you can just take a look on your main pages to find out if some duplicate issues appeared lately.

Also, you can check certain pages for duplicates using the following operator: site:mysite.com intitle:the title you’re checking.

And be sure to click “repeat the search with the omitted results included” at the bottom of SERP. Without it, Google will show you only unique pages.

Google Search Console

Go to the Search Appearance section and click on HTML Improvements. There you can find  Duplicate meta descriptions and Duplicate Title tags.

Here is what it looks like:

Google Search Console

But, unfortunately, it’s the only type console can show. So, this method can help only to check if duplicates exist on your site, but it isn’t suitable for deep investigation.

Serpstat Site Audit

Serpstat is an All-in-one SEO platform with 5 modules:
Keyword Research
Competitive Analysis
Backlink Analysis
Rank Tracking
and Site Audit

Create a project and set the needed audit parameters. There you’ll see a list of errors divided both by the error type and level of priority. Go to the Meta tags section of the Audit module to see the list of pages that have an identical title or description tags.

Here’s what it looks like:

And you can also see the detailed report to see which pages breed duplicates:

Serpstat site auditor

Serpstat is one of the best options because it’s a cloud-based platform, which means you can access the audit results from any place and you don’t have to run anything on your computer. Plus it shows all of the SEO errors on your website, not just duplicate issues.

How to Fix the Duplicate Content Problem

To ‘fix’ the duplicate content problem can mean three ways:

  • to remove unnecessary ones
  • to hide such pages from the search engine
  • to point to the main pages

Here are the most common methods to do it:

  • set 301 redirect

It refers to the small differences like www and non-www versions or HTTP and HTTPS, with and without “/” at the end, etc.

You can show the search engines which page is the main setting 301 redirects from the duplicate page to the original one. This way these pages won’t be considered duplicate content because the robot will always be redirected to the main page. Doing this simple trick can help you massively, even if you are in local markets like Mental Health Marketing.

The alternative to this method is to choose the preferred domain at Google Webmaster Tools: with or without www. But, you should remember that everything you set in the Google Webmaster tool works only for Google.

Google Webmaster Tools

  • use rel “canonical” tag

It can be useful when you deal with sorting and filtering pages. You can’t just remove them, but the robot considers all these pages as duplicates. And since, for example, an online clothing store usually has hundreds of different kinds of dresses, imagine the amount of wasted crawl budget on these pages.

To avoid such a problem, use the rel “canonical” tag. Thus when the crawler visits these pages, it understands that the category page is preferred and there is no use in indexing the other hundred pages.

Here is what it looks like:

<link rel=”canonical” href=”https://blog.example.com/dresses/green-dresses/” />

to the page

https://blog.example.com/dresses/green-dresses/?sort_min_price

  • use meta robots

It fits the pages you don’t need to be indexed by the robot (for example, the basket page, printer-friendly pages e.g.). It allows search engines to crawl a particular page but not index it.

Here is what it looks like:

<meta name=”robots” content=”noindex, follow”>

You also can use the tool SeoHide to forbid the robot to index such pages.

  • Set rel=”prev” and rel=”next” tags for pagination

Use rel=”prev” and rel=”next” tags to help Google understand that this is not a duplicate but a pagination. Tag rel=”prev” stands for the previous page, while rel=”next” for the next one.

Here is what it should look like:

At <head> http://site.ru/category/

<link href=”http://site.ru/category/2/”>

At <head> http://site.ru/category/2/

<link href=”http://site.ru/category/”>

<link href=”http://site.ru/category/3/”>

Final Thoughts

So, duplicate content can be the major SEO ranking factor,  so this issue is definitely worth your attention. You probably can’t see the decline significantly, but it can be a good explanation for the stuck in the same position or you’re slowly dropping.

In this article, I covered the basics of the duplicate content issue. There are much more reasons, consequences, and ways to fix it. But I hope I managed to show you how important this problem is so you can check and improve your site using my recommendations.

Check out other interesting reads from our blog here: