Why Third-Party Click Fraud Estimates Don't Add Up

I want to thank everyone who has written to me with questions since I started blogging about the work we do at Google to protect advertisers against click fraud. I'll be catching up on some of those questions in the next week, but today I want to address some of the more recent items in the media on click fraud rates.

There was a press release yesterday from ClickForensics stating that their quarterly measure of click fraud for Q4 was 14.2%. They also stated that this was the year's "highest level" (up from 14.1% in Q2) and that the click fraud rate for search engine content networks was 19.2%. This morning there was a competing press release from Incremental Advantage and several other click fraud firms, stating that "Click Fraud Cost Internet Advertisers $666 Million in 2006".

On a basic level, these numbers are much higher than what we see at Google, and are not at all representative of the actual statistics of our network. Most savvy advertisers and industry pundits are already aware of this (see "Why We Can't Trust Click Fraud Numbers" in yesterday's WebProNews), and generally haven't paid much attention to these estimates for a while.

However, these stats are still out there and there are some things everyone should keep in mind when reviewing them. Specifically:

  1. Many third-parties have not even counted clicks properly
    We did an analysis of Click Forensics and other click fraud consultants back in August 2006 to see why their numbers were so inflated (see "How Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports" on the Google AdWords Blog). We found serious flaws in their counting of clicks - a more fundamental issue than their counting of click fraud. They were making basic counting mistakes and inflating the number of clicks by an average of 40%. The source of this problem is incorrectly counting page views – from users browsing through an advertiser's site – as clicks.
  2. Inflated click counts result in even more inflated "click fraud" estimates
    This over-counting problem results in an even more dramatic inflation of click fraud estimates, in fact consistently classifying an advertiser's best users (the ones spending time browsing their site) as fraudulent. As a result, conclusions based on this data are flawed and the small differences in overall percentages they report are not meaningful. And instead of protecting their businesses against click fraud, advertisers can actually harm their businesses by acting on recommendations from these reports.
  3. Even if they fixed those problems, they're not actually measuring click fraud
    Even if they were counting clicks correctly, they are still trying to measure only activity (attempted click fraud) and not advertiser impact (actual click fraud). That is, even if they corrected the basic engineering and accounting problems contributing to the above problems, they would still be counting clicks we filter (and do not charge to advertisers) in their click fraud estimates. They admit this.
  4. Industry metrics (in any area of our business) are not necessarily the same as Google's metrics
    The advertisers in their sample are part of many different networks and not all of these networks have invested as heavily as Google in click fraud protection.
  5. ROI on the content network is the same as it is on search
    We know there is a more direct incentive for fraud on the content network and we do much more to protect advertisers, ban bad publishers, and improve ROI through SmartPricing discounts. As a result, average ROI on our content network is nearly the same as on Google.com. Yes, you read that right. ROI is the same on average - and not by accident, but because we automatically provide discounts to advertisers to make it so.

The key point here is not that their numbers are "too high". The point is that their data collection methods are inherently flawed and any resemblance their numbers could have to reality would be coincidental. Even so, given that they are not measuring click fraud (see point #3), they apparently don't intend their numbers to reflect reality.

Click fraud protection is something we take very seriously at Google, and it requires a high level of scientific rigor to do well. It's frustrating to see basic mistakes being made by firms selling "additional protection" to AdWords advertisers - in essence, charging them money for advice which can actually hurt their businesses. I've spoken with many firms and a number of academics interested in this area, and the ones who are investing in serious R&D efforts recognize the limitations of their data and analysis and have not been focusing on publicizing unsupportable and flawed numbers such as the above. We're very supportive of those efforts (and in scientific research in this area in general) and we'll continue to work closely with them.

For more information about Google's actual metrics, you can see my previous posts here and here.

Update: I've posted a second part to this post, with more technical details on points #1 and #2.