Duplicate Content Issue: What Does Google Say?

Last week I posted something about a duplicate content penalty, and got various responses. Some people don’t understand it or believe it’s a problem, other people are still adamant that it is.

Well, I suppose we should let Google have the official word. Read their guidelines (if you can understand it all) at the Official Google Web master Central Blog. Also, if you’re more technically inclined, there’s a thread with a lot of questions and answers about various issues.

Here’s the part that seemed most relevant to me:

What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

Why does Google care about duplicate content?
Our users typically want to see a diverse cross-section of unique content when they do searches. In contrast, they’re understandably annoyed when they see substantially the same content within a set of search results. Also, webmasters become sad when we show a complex URL (example.com/contentredir?value=shorty-george?=en) instead of the pretty URL they prefer (example.com/en/shorty-george.htm).

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index. {Here’s another understandable explanation of what they mean by filtering.)

How can Webmasters proactively address duplicate content issues?

  • Be consistent: Endeavor to keep your internal linking consistent; don’t link to /page/ and /page and /page/index.htm.
  • Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we’ll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer.
  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
  • Don’t worry be happy: Don’t fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google. If you do spot a case that’s particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

The bottom line seems to be that Google doesn’t apply a penalty so much as they filter and decide which one will be the article/site they choose. Of course, you want them to go to your site, not someone else’s.

To me, what all this means is that my strategy of including my articles on my site once, instead of in the newsletter archive page and a separate article page, is probably wise. Also I will continue to change my article by around 20 percent from what I submit to ezine directories, and only submit to ezine directories after I get my article indexed by search engines.

5 Responses to “Duplicate Content Issue: What Does Google Say?”

  1. Thanks for giving us the straight scoop on this perplexing issue. I hope this calms the fear of people who are losing sleep over ‘dupicate content’.

  2. Lynne Lee says:

    It seems that the advice we’ve had so far is good advice.
    So I’ll continue to publish on my own web site first and make sure that the article is at least 25% different just to be sure.

    I do have articles in directories that aren’t on my website. I have plans to include some of them on my own site but will make sure that they are expanded and different enough not be considered duplicate.

    I guess as long as we don’t start using re-prints right stuff that other people use en mass, all is well.

    Thanks

    Lynne
    http://www.christianlifecoaching.co.uk

  3. Hi Diane

    I found your article very interesting.

    Like many others I take the issue of duplicate content very seriously.

    I was fortunate enough to come across a super free (at the moment)

    program “DupeFree Pro” that will compare two articles side by side and

    and will give a percentage of duplicate content, it will also check the web

    for duplicity as well. No, I am not pushing an affiliate link, I just think the

    software is really great!!

    Once again great article!

    Regards

    Christopher Phillips

    http://www.cothivalebooks.com

  4. diane says:

    Thanks for sharing this, Christopher! I’ll check it out.

    Diane

  5. Welcome and well done! Sunday I was of the opinion that Duplicate Content Issue: What Does Google Say? was a really good concept. But after reading more about Rare Book Store, I\’m not so sure… What do you think?

Leave a Reply