Google's duplicate content webmaster guide defines duplicate content (for search engine optimization purposes) as "substantial content blocks within or across domains that completely match other content or are remarkably similar."
The Google guide continues to list the following as examples of duplicate content:
- Discussion forums that can generate regular and simplified pages aimed at mobile devices
- Store items displayed or linked through multiple different URLs
- Versions only for web page printers
Search engines should penalize some instances of duplicate content that are designed to send spam to your search index, such as:
- scraper sites that copy content in bulk
- Simplistic spinning techniques for articles that generate "new" content by selectively replacing words in existing content.
When search engines find duplicate content, they can:
- Penalize a whole site that contains duplicate content. (when spam)
- Choose a page as a canonical source of content and reduce the priority or do not index the other page with duplication. (common)
- Do not take any punitive action and index multiple copies of the content (rare)
Avoid internal duplication.
When asked about duplicate content, Google's Matt Cutts said it should only hurt him if it looks like spam, but many webmasters use the following techniques to avoid unnecessary duplication of content:
- Make sure that the content is only accessible under a canonical URL
- If your site should return the same content in several URLs (for example, for a "print view" page), specify a canonical URL manually with a link element in the header of the document
- In cases where your site returns similar content based on parameters encoded in the URL (for example, sorting a product catalog), exclude URL parameters in Webmaster Tools.
Syndication of contents
Publishing content on your site that has been published elsewhere is called content syndication. Creating duplicate content through content syndication can be fine:
- While you have permission to do it
- You tell your users what the content is and where it comes from.
- You link to an original source (a direct and deep link to the original content of the page with the copy, not just a link to the home page of the site where the original can be found)
- Its users find it useful
- It has something to add to that content, so users prefer to find that content on their site than elsewhere. (Comment or criticism for example.)
- It also has enough original content on its site (at least 50% original, but ideally 80% original)
While Google does not penalize for each instance of duplicate content, even duplicate non-penalized content may not help you get visits:
- You are competing with all the other copies that are out there
- It is likely that Google prefers the original source of content and the most reliable copy of the content.
Google will penalize duplicate content posted on your website from other sources if:
- It seems to be scraped or stolen (especially without attribution).
- Users do not react well (especially when clicking on go back to Google after visiting your site).
- There are so many copies available that there is no reason to send users to your copy.
- Your copy is not the original, the most reliable or the most useful; and has no comments or criticism.
- Your site does not have enough original content to balance all republished content.
- Duplicate pages so often within your own site that Googlebot has trouble crawling the entire site.
Internationalization and Geo Targeting
Content localization is an area in which the duplication of content can be beneficial for SEO. It is perfectly fine to publish the same content in sites aimed at different countries that speak the same language. For example, you can have a site in the USA. UU., A site in the United Kingdom and a site in Australia, all with the same content.
With a site for each country, it is generally possible to rank better for users in that country. In addition, it is possible to specifically address users in each country with small differences in spelling, prices in the currency of the country or options for sending products. For more information on setting up geo-targeting websites, see How should I structure my URLs for SEO and localization?
Deal with content scrapers
Other sites that steal your content and republish it without permission can cause duplicate content problems for your site. Search engines work hard to ensure that it is difficult for scraper sites to benefit from duplication of content. If a scraping site is causing you problems, then the Google index site may be removed when you submit a DMCA request to Google