How to Fix Duplicate Content and Improve your SEO

Duplicate content is a common issue in SEO and content marketing. Although it is rarely malicious in intent, it can still get you penalized by Google. And even if you manage to slip an actual punishment, there are still many ways duplicate content can hurt your rankings. But what exactly counts as duplicate content? How does it affect your SEO? And most importantly, how can you fix duplicate content and improve your SEO? The bad news is that defining and finding duplicate content may not be as simple as you think. However, the good news is that there are ways to fix the issue.

What counts as duplicate content in SEO?

There are two different definitions of duplicate content – one pretty narrow and one somewhat broader. According to the narrow definition, duplicate content is when substantial chunks of text across various pages (on one or multiple websites) are identical or at least very similar. More broadly, duplicate content is content that doesn’t add much value to your website either because similar content already exists or because the content in question is irrelevant. Both of these things reflect poorly on user experience and are therefore frowned upon by Google. Remember: while content is an excellent way to drive traffic to your site, just having a lot of content is not enough. The content you produce has to have value.

Before you can fix duplicate content and improve your SEO, you need to know what duplicate content is and how to identify it

Types of duplicate content

Duplicate content comes from a variety of sources and can take various forms. Some of the most common types of duplicate content are:

syndicated content: even when you cite your sources, repeating someone else’s words, comments, or statements can be flagged as duplicate content
print-friendly pages: if you create an HTML version of a page to make it more printer-friendly, you’ll be creating duplicate content
old and archived content: if you’ve updated a post but haven’t deleted the original version, odds are the two will be very similar, and therefore duplicates
URL issues: dynamic URLs and URLs with session IDs, URLs with and without www. in them, and HTTP and HTTPS URLs will all create duplicate content problems

Why do you need to fix duplicate content in order to improve your SEO?

You’re highly unlikely to incur a real penalty from Google for just having some duplicate content. That said, you shouldn’t copy somebody else’s work in an attempt to create SEO-friendly content faster. Even when it isn’t strictly illegal, plagiarism is not a good look. What is more, it’ll hurt your SEO rankings. This happens even when the content you’re duplicating is your own (so there’s no stealing) and even when the duplication is accidental. Why?

Duplicate content confuses Google bots. When Google indexes your pages, it’s not a human being behind the process – it’s a bot. A bot can’t distinguish between two similar pages and make a judgment call on which one is better. Instead, it’ll index both pages. This means two of your pages will compete with each other. Consequently, both will suffer in rankings. And it’s even worse if the duplicate content is on a different website. Then things like domain authority, credibility, and the time when the content was created will come into play. Are you sure that you can beat out everyone else with similar content on these metrics?
Duplicate content lowers user experience. Have you ever clicked on a link on someone’s webpage, excited to discover more, only to be disappointed by a page full of the information you just learned? Nobody likes that. Your users certainly won’t. They’ll probably leave and never come back. And since Google values user experience, this can even harm your rankings.

Even without a direct penalty, duplicate content will negatively impact your position in Google's search results

Fix duplicate content and improve your SEO by locating existing issues

If you have any duplicate content on your website (and you probably do), you’ll want to fix that as soon as possible. Duplicate content doesn’t help anyone. Getting rid of it, on the other hand, can significantly improve SEO.

Tools you can use to identify duplicate content

Before you can fix duplicate content issues, you need first to find them. Since a lot of duplication is accidental, you might not even know about it. Luckily, there are ways to discover duplicate content using Google search operators and different software:

You can manually go through your own website to find glaring examples of duplicates or perform a more comprehensive SEO audit to take a closer look at potential SEO issues that include duplicate content.
If you type site:yourwebsite.com (with the appropriate URL) in Google, you’ll see all the pages Google has indexed for your domain. Look for duplicates there.
You can use Google Webmaster Tools to crawl your website and find errors (including duplicates).
Third-party software, like Copyscape, OnCrawl, Siteliner, and others, can help you uncover duplicate content.

You can use Google's own tools to identify duplicate content

What to do with your duplicate content?

Once you’ve discovered duplicate content on your website, it’s time to decide what to do with it. There are several solutions you can try depending on your overarching strategies:

You can delete one of the pages. If the problem is an entire page or post, keep the one with better on-page optimization and delete the other. Simple!
Edit some of the content. If the issue is a specific section of the content, consider editing it. This is, of course, only if the content is really necessary. Otherwise, delete it.
Set up a 301 redirect. This is an excellent solution for content duplicated through URL issues. Select the best page containing the content in question and redirect all other versions to this page. Your users won’t be able to see the redirected pages, and Google will know which page to index. The added benefit (compared to just removing content) is that the target page will now benefit from the popularity of redirected pages.
Use the rel=canonical tag. Another good solution for duplications coming from multiple URLs, the canonical tag works much like a 301 redirect. It tells Google which of your URLs (and thus pages) is the original and which are just copies.
Use NoIndex and NoFollow tags. Using a NoIndex or NoFollow tag tells Google that the page in question is not to be indexed. That way, even though the page still exists, it won’t affect the rankings of other similar pages.
Set a preferred domain. If you have similar content on several different domains (with HTTP and HTTPS links or different subdomains, for example), you can tell Google which one to index and which ones to disregard. That way, content that exists on both will only be indexed from your preferred domain.

Fix duplicate content and improve your SEO in the long run with a good strategy

Ultimately, the best, fastest, and easiest way to fix duplicate content and improve your SEO in the process is not to create such content in the first place. That way, you can avoid the whole song and dance of finding duplicates and resolving them. But it’s going to be really hard to keep in mind everything you’ve ever created and published, especially if you’re running a large website where you post a lot. So you can’t just improvise as you go along, hoping that you won’t accidentally duplicate your content. Instead, what you need is a good strategy. Decide on your URL structure early on and only create URLs in line with it. Set out a plan for the content you’ll publish. Finally, do regular audits to discover issues early on and update your strategies accordingly. In the long run, this is the best way to avoid duplicate content.

About Author: Melinda Peterson is a content manager for Digital Dot New York who specializes in multimedia content strategizing. She started her SEO career blogging on her own private website but quickly realized she had a talent for turning content into a business.