You might be pretty mad if you found out that your articles on the Web were being copied and inserted into someone else’s site without permission or credit.
To add injury to insult, the “duplicate” site can show up in search-engine listings and make your own site’s listings disappear.
This is an example of “Googlewashing” — a term that combines Google and brainwashing — and it’s becoming a serious problem for a growing number of content-based businesses.
Whatever You Publish Is Mine
A recent instance of an original site being pushed down in Google’s listings by duplicates actually got some attention at Google recently because of the person who was affected — one of Google’s own search quality engineers.
The original post
The aforementioned Google employee, Matt Cutts, maintains his own site, MattCutts.com/blog, on which he publishes articles and comments about the search giant’s ranking methods.
An offhand remark of his about a new dish in the Google cafeteria called “bacon polenta” happened to lead off one such article a few weeks ago.
Excerpts of the post
A few days later, the Threadwatch marketing blog noticed a strange thing. You could search for the entire first two sentences of Cutts’ article in Google (which should produce only one hit), but his article wasn’t even No. 1.
Instead, other bloggers — who’d innocently pasted into their sites an excerpt of Cutts article, and then legitimately linked to the full piece — were ranking higher on the phrase. Threadwatch posted a screen shot demonstrating the effect.
To add a note of gravity to all this hilarity, a French group known as Dark SEO Team (the acronym means Search Engine Optimization) developed a deliberate hack into Google’s search-engine algorithm.
The team’s own copycat page was soon also ranking higher than Cutts’ original. And the group made a chilling claim:
“Anyone can use Google’s duplicate content filters to ruin a competitor’s website, and steal his ranking and traffic,” according the team’s Open Letter to Matt Cutts.
In addition, Dark SEO Team has made waves in the world of search engine marketing by demonstrating how an ordinary site can fake “Page Rank 10.”
PR10 is Google’s highest score for Web pages. Such a high Page Rank gives a must-coveted boost to a site’s content in Google’s results.
The problem with fake page rank has become so great that a site called SEOLogs.com has even posted a Fake Page Rank Detector.
You enter a site’s Web address, and the detector tells you whether that site has earned its Google Page Rank or has succeeded in faking it.
What’s going on here? The implications are worrisome for all kinds of legitimate e-businesses, not just those that specialize in helping Web sites get better search-engine rankings.
It’s Not Nice To Fool Mother Google
Matt Cutts’ original page wasn’t listed as high as other pages that merely copied a piece of his content because of a problem that all search engines are facing, not just Google.
Since search engines have become a universal way to find things on the Web, many shady promoters post thousands of Web pages hoping that one or more will show up near the top of the listings.
These sites use a variety of tricks to “look good” to the search engines’ bots. A human being who happened to find one of these sites, however, would immediately see that the content was little more than links to other sites, usually links that pay commissions to the site owner.
Search engine professionals call these “spammy sites.” Some of them have used the techniques of search engine optimization so well that they rank higher than well-written sites.
When human searchers are led to sites like this, they blame Google and other search engines for recommending a page that wasn’t worth visiting.
Since many of these spammy sites use the same content over and over, Google and the other indexes have added software routines that try to eliminate duplicate content.
If the same words and phrases are found on several sites, some of the sites will be pushed lower down in the ranking or not appear at all.
The problem is that Google can’t easily tell which of several duplicate sites is the genuine, original source of the content.
Imitation Is The Sincerest Form Of Invisibility
Google has by now corrected whatever it was that was preventing its employees’ blog from coming up as the No. 1 listing on the phrase that he used.
Today, all you have to do is enter the words bacon polenta at Google and Cutts’ original work appears at the top of the list.
Googlewashing, and the issue of duplicate content confusing search engines, is much larger than this one example, though.
The effects can hurt everyone from blogs that distribute material via RSS (Rich Site Syndication) to corporations that publish works that others may or may not be authorized to reproduce.
Next week, I’ll describe how you can prevent your Web content from being duplicated — and how you can keep normal, authorized syndication from making your site invisible to search engines.