Duplicate Content

// February 3rd, 2011 // Webmaster tips

Matt Cutts recently announced a Google Algorithm change launch that aims to “drive down spam levels” by devaluing content of sites that republish content and have “low levels of original content”.

Content re-publication is a big issue for content developers, particularly bloggers. There is nothing more frustrating than taking the time to research and write a post to only find it appearing on someone else’s site – often with no credit (even with credit I personally still feel ripped off!).

ProBlogger’s Darren Rowse also wrote about the change in an article, Do You Republish Other People’s Content? You’ll Want to Read This. Commenters on the post were united in welcoming the change however like most of what Google does there seems to be more questions then answers.

For example how does Google determine something has been republished? If a site publishes an article it’s only the main body of the page that duplicates – the header, footer, side-bar, comments, etc will all be different still. Google would need to have a threshold where it’s deemed to be a copy – could this incorrectly penalise a site quoting content?

Another important question is how does Google know who the original publisher was? The first site where a Googlebot finds the content doesn’t necessarily mean they are the original author – your site might only get indexed once a week whereas the other site could get indexed daily.

Ideally I would like to understand this more just to make sure Google gives you the credit for your content. Logically I think the “low levels of original content” is the key – i.e. if a site is known to republish a lot of content then it assumes all it’s content is copied.

I’ll let you know if I find out more.

Leave a Reply