marahmarie: Sheep go to heaven, goats go to hell (Default)

Mentalfloss.com made news recently for claiming they had to take down their RSS feeds in order to prevent content-scrapers from getting their website blacklisted in Google - after their entire website was removed from Google's index. Google's Matt Cutts vehemently denied that was why they were blacklisted, claiming Mental Floss was removed from the index only after it got hacked. Matt Cutts even republished the email he claims Google sent to Mental Floss' webmaster(s) here. In it Google supposedly explains MF's sudden and complete removal from the index:

"The site was hacked. RSS has nothing to do with it." Matt Cutts on Jul 21st, 2008 @ 1:30pm

We emailed this site on July 7th to let them know exactly why we were removing the site; looks like it got hacked and was showing nasty content. It has *nothing to do* with full-text RSS feeds.

Here's some of the email that we sent on July 7th to this site owner:

Dear site owner or webmaster of mentalfloss.com

While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guidelines.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.

The following is some example hidden text we found at eg: http://www.mentalfloss.com/blogs/archives/2192:

economics times india

The application fee is collected by the JUPAS economics times india on behalf of the 9 participating institutions and is not refundable or transferable to another year. free 2004 income tax forms

Request for use of Accumulated Surplus must be signed by the Hon Fin Sec/Treasurer and countersigned by the President of the Union/Club and submitted to OSA for approval. According to the agreement, Castrol will use Deutsche Bank's complete end-to-end payment and collection solution, as well as db-eBills - the Bank's innovative electronic invoice presentment and payment (EIPP) solution. The Internet's largest source of legitimate, copyrighted 100% digital sheet music since 1997, we now have over 10,000 songs for instant download! For extremely poor families, free 2004 income tax forms provides emergency assistance, while the conditionalities promote longer-term investments in human capital. Australia order viagra online clinic uk in Australia order viagra without a prescription in Australia order generic viagra and other prescription drugs online in Australia viagra order by phone in Australia viagra order on line in Australia order cheap viagra in Australia levitra cialis viagra comparison online order in Australia buy online order viagra in Australia order generic viagra in Australia order viagra overnight in Australia order by phone generic viagra in Australia viagra no prescr chase mastercard rewards program

A device which forms a digitised image of a human fmger print for the purpose of biometric authentication. T subject to search without a warrant while on prison property, according to the lawsuit. It is rare to find an amateur player using this move in a poker game, so if your opponents see you using this move they can be fairly sure you know how to play good poker, and may think twice about bluffing you out of future pots. Download one of listed teens for chase mastercard rewards program taylor torrents or choose from category bit torrent downloads listed here to download your favorite torrent at torrentz. ACI Worldwide Eastern Europe Development is the fast-growing Romanian branch of ACI Worldwide.

bad credit personal finance loans

[...]

In order to preserve the quality of our search engine, we have temporarily removed some of your webpages from our search results. Currently pages from mentalfloss.com are scheduled to be removed for at least 30 days.

We would prefer to have your pages in Google's index. If you wish to be reconsidered, please correct or remove all pages (may not be limited to the examples provided) that are outside our quality guidelines. One potential remedy is to contact your web host technical support for assistance. For more information about security for webmasters, see Security Checklist for Webmasters.

When you are ready, please visit Google for Webmasters [MM's note: I had to hide the Google links Matt included because they were breaking my layout] to learn more and submit your site for reconsideration.

Sincerely,

Google Search Quality Team

The people who run Mental Floss never received that email. The first time they knew of it was after seeing it on TechDirt. Even after seeing it there, and though it seems they were unaware of their website having been hacked, Will Pearson, the President of Mental Floss, commented to TechDirt later:

I was just informed of this post/conversation and wanted to chime in. I'm the president of mental_floss and simply wanted to clear up some confusion. We did not claim that Google instructed us to tweak our RSS feed and we are not blaming Google for any of this. For some reason I did not see the note from Google posted above and so we did not realize why we'd been pulled from their search.

Once we realized we were no longer in Google's natural search, we immediately began taking steps to try and figure out what was going on. After asking a few others with experience in this area, it was suggested to us that we make sure no one was lifting our content from our RSS feed and publishing it in full on their site. We discovered another site that was and decided to tweak our RSS feed just in case that was the cause.

We are continuing to look into this and will resolve the problem Matt has pointed out.

It's very important to us that we are included in Google's index again so we'll work quickly to get this fixed. It's unfortunate because we run a clean operation so I hate that this has happened.

But again, this is not Google's fault. They've simply recognized a problem and we'll work to fix it.

Matt, if you'd be willing to discuss, I would love to have a conversation with you. Thanks for your attention to these matters.

Thanks,

Will

So what was Mental Floss originally claiming? That they got removed from Google's index for allowing their content to be scraped through full-summary RSS feeds, a bit of nonsense that they picked up from their supposedly SEO-knowledgeable friends. How did Google reply? By claiming that wasn't the reason for the removal - hacking was. Also, Will is really careful to "not blame" Google, since they, like 99% of the population, fear Google's wrath much too much to dispense with the usual butt-kissing.

But what of this hacking? You can't tell if Mental Floss was hacked by checking Google since Google removed the cache for that page. You can't tell by checking the page source - everything looks fine. You can't tell by checking Yahoo!'s cache or the Wayback machine at archive.org - though TWBM did index Mental Floss' content for October, 2006, which is the year and month in question, the page that supposedly got "hacked" is missing. You can't tell by checking MSN's cache, either - the page source is as clean as a whistle, having been retrieved just yesterday.

So what's the deal? Most likely comment spam. The page Google claims got "hacked" is almost two years old and received only one comment. Old web pages are usually good targets for comment spam since they are often quite neglected, especially on websites with tremendous amounts of content. Mental Floss certainly fits that bill. Would Google ever admit they tossed a site merely for having a single spam comment? Probably not. Would Google actually toss a site for having a single spam comment? It's possible. What the robots crawl might "look" like a hacked page when in fact it is not.

Am I suggesting that Google kicked an entire popular website out of the index for having a single spammy comment that was identical to the Viagra-laden gooblygook above that Matt Cutts claims was "hidden text in a web page"? Why yes, I am. Short of a better explanation from the people who run Mental Floss, I don't buy that a single two year old web page on a huge website with perhaps thousands of pages of content was "hacked" to suddenly include "hidden text". Google never explains how that could have been done or who did it, and having seen plenty of hacked pages on the Web I know for a fact that when they're hacked it's usually to include not just hidden text but tons of links to spammy websites that are just waiting to profit from the hacking.