home   |   archives   |   about
 
 
SEARCH THIS SITE

Google, Click Fraud, and Invalid Clicks

By Shuman Ghosemajumder | Tuesday, December 12, 2006

Yesterday, Andy Beal posted a detailed story on Google and click fraud, in which I was quoted as saying that Google's click fraud rate is less than 2%. Did I really say that? Not quite.

First, some background. Andy and I met during the Search Engine Strategies conference in Chicago last week, and we spent an hour talking about our systems, methods, and policies for fighting click fraud. As everyone who has ever spoken to me about this knows by now, this is an issue we take very seriously, and have dedicated extensive resources to managing effectively. Unfortunately, there is a great deal of misinformation on this topic (mainly from third parties with an incentive to exaggerate the issue), so we have been exploring ways to become more transparent ourselves. Our top priority is to protect advertisers, so that means not disclosing any proprietary methods which would allow click fraud perpetrators to reverse-engineer our systems. However, there is still a great deal of information we can share. I and others on our team have spent literally hundreds of hours on communications and sharing such information outside Google. The goal is to improve the level of understanding of this issue to arm everyone against the FUD out there.

Andy's story provides a great summary of some of the key facts at Google:

  • Invalid clicks and click fraud are separate but related concepts (invalid clicks simply being the clicks for which we do not charge advertisers)
  • We have a four-stage process which detects the vast majority of invalid clicks before they affect advertisers
  • The total percentage of clicks we mark as invalid in our system is consistently in the single digits

We had a limited amount of time to cover a lot of ground, and of course, some miscommunication can result when discussing an issue of this complexity. Unfortunately, the most significant fact that seems to have been misrepresented is the one in the headline. Specifically, I never said that our click fraud rate is less than 2%.

Instead, what I said is that the quantity of invalid clicks which we detect as a result of reactive investigations is a "negligible proportion" of the total number of invalid clicks. Andy asked me if that percentage is less than 2%. I told him that I was not able to provide a bound, but yes, "negligible" certainly means less than 2% of invalid clicks.

However, more significantly, this is quite a different thing than saying that our "click fraud rate" is less than 2%. When we mark clicks as invalid because of suspected malicious activity, the vast majority of the time we do so proactively, and none of those cases are included in the reactive figure in question. We proactively discard a single-digit percentage of our revenue, primarily by filtering traffic before it impacts an advertisers' budgets and, less significantly, through off-line banning of AdSense publishers which leads to refunds to advertisers. The difference between proactive and reactive detection is the difference between the "attempted click fraud" caught by us and the click fraud which actually affects an advertiser in a way that requires their action to correct (by asking for an investigation). Obviously it is the second category which advertisers actually care about, and I think that is the spirit in which Andy wrote his headline.

So what is our overall "click fraud rate"? As noted in the diagram in the story, it is virtually impossible to know the intent of every click. However, we can do a very effective job using statistical techniques to detect potentially malicious behavior, and the total number of invalid clicks we detect – whether for suspected malicious or non-malicious intent – is in the single digit percentages. So third-party estimates which say that click fraud is 15% or higher appear to clearly be substantial exaggerations.

I gave Andy this feedback, and he was able to make a few updates and corrections, but unfortunately was not able to change the headline. With the aforementioned caveats in mind, I would invite everyone to read Andy's article, as it does provide a great overview of the basic structure of our systems and philosophies about fighting click fraud.

   

Comments

So what is our overall "click fraud rate"?

You never answered the question you yourself posed :)

I think it's great that Google is willing to start disclosing (in public) a bit more information on click fraud.

I certainly do understand the need for protecting some of the algorithms however I'm sure that Google also realizes that doing so tends to create a certain amount of distrust from it's users.

G-Man

Geoffrey Faivre-Malloy
December 13, 2006, 7:20AM


Shuman -

Thanks for the quick clarification. It seems to me a lot of the confusion stems from lack of a common vocabulary and I'm hoping the terms you use here will help with this.

I think some of the high "fraud" estimates include what many would call "worthless clicks" which I Google has trouble counting. An example of this would be when a legitimate but "made for adsense" site with targeted content manages to get into adsense program but then receives traffic from questionable sources like spyware toolbars. I think Google does a *better job* keeping these sites out than Findwhat and Enhance, but if you include this "worthless click" traffic I think the 20-30% estimates are realistic based on my limited buying experiences and comments from larger ad buyers.

Joseph Hunkins
December 13, 2006, 9:01AM


Geoffrey — you're right, I did not provide a specific click fraud rate. We don't have such a metric to disclose, because there's no exact way to determine "intent" and we certainly do cast the net wide in terms of throwing out many clicks which we know have nothing to do with fraud. The total percentage of all of those clicks that we don't charge advertisers for in this fashion is in the single digits.

Joseph — using spyware to deliver AdSense ads or drive traffic to your site is strictly prohibited by our program policies, and we regularly ban AdSense publishers for such violations (on a daily basis, in fact). That being said, the quality of the content on a site is a separate issue from click fraud. A site with low visual quality can actually have very valuable traffic, just as a high visual quality site may be engaging in click fraud. The key thing is that these are independent factors. In terms of addressing the relative quality of traffic from different sites, in addition to determining whether or not we believe truly malicious traffic is coming from a site, when traffic isn't marked as invalid, our SmartPricing system still provides sliding discounts to advertisers. The "lower quality" a site's legitimate traffic appears to be to us, the steeper the discount to the advertisers. By providing such discounts, we normalize ROI across different parts of our network. The end result is that the amount advertisers pay for sales or leads (the metric of ROI) is roughly the same on AdSense as it is on Google.com. SmartPricing reduces both Google's and our AdSense partner's revenues – in order to provide better ROI for our advertisers, and promote the overall health of our online advertising ecosystem. This isn't something that is very well-known – thanks for bringing it up!

Shuman, December 13, 2006, 11:03AM


Great follow up post. As an advertiser, we are relieved that google is trying to protect our interest and our wallet from invalid clicks and click fraud. Keep up the great work!

Scott Springer
December 13, 2006, 11:23AM


Shuman,

I'm inclined to err towards your reading of the situation. Obviously, for corporate and legal reasons you can't be explicit about the exact level of click fraud.

However, the critical issue here is one of definition. We are talking "click fraud". We are not talking mis-clicks and mal-clicks. Fraud implies deception for some direct or indirect pay-off.

From my experience correctly defined click fraud is closer to 5%. Although not neglible, my belief is that there are many other campaign factors that can have a much greater impact on return.

David Burdon
December 13, 2006, 1:28PM


Excellent clarification on that Shuman - thank you!

Joseph Hunkins
December 13, 2006, 6:17PM


Thanks for the clarification, Shuman. However, I'd like to point out that since the intent of every click cannot be known, the 15% estimates given by some third-party companies may be correct (or even too low), even though the methods they have used to determine their estimates may be incorrect in some ways.

There are many ways to generate fraudulent clicks, both using automation and humans. The ease by which this can be done is due to the openness and economics of the Internet architecture. In general, I have always felt that rather than go to extensive (and expensive!) means of fighting click fraud, it is better for advertisers to pay on a fixed fee basis. At least this should be a choice among many other options (CPA, CPM, CPC, etc.).

CPCcurmudgeon
December 13, 2006, 10:07PM


Thanks Shuman for providing us with more information about the program. I read a couple of articles about this in the last time, but there are still some things not very clear and I guess that I missed a lot of things.

An example. Schools, companies etc. have mostly one IP address which is shared with many users, sometimes a couple of hundred. So lets say somebody from such an institution has an AdSense account and this person checks the AdSesnse status, I guess this would log the IP somewhere at Google.
Now there are other people visiting the site of this person ( from the same institution ) with the AdSense account and let's say people do click on some advertisement. The IP of these people will be logged also somewhere at Google. Since the IP of the AdSesne account owner and the visitor is the same, this would allert Google about click fraud/invalid click, cos this would look like the AdSense account owner clicked on the Ads and this would result in closing the AdSense account.

The same might happen with dynamic IPs. Person A connects to the internet, checks the AdSense account and disconnets. Now person B using the same ISP connets to the intenet and gets the IP of person A. Person B visits the website of Person A and clicks on some ads, wich would alerr Google aswell for potential click fraud. I guess it's not only the IP that is a part of the decision you make, but the cookies aswell.

Is it possible that AdSense accounts got closed because of things like this? How do you exaclty know that it was a fraud going on? I know that it's a company secret how things realy work. Is it possible that Google decided to close the account as a preventive action, before it's realy clear that it's a fraud working here?

Igor Klajo
December 14, 2006, 2:13AM


Really helpful, thanks. Google does shoot itself in the foot with advertisers over this. We do a lot of click analysis for many reasons. It is noticeable that there are clear patterns, which should have no relationship with intentional click fraud, for which Google makes no statement either way. I have an article in draft about some of this, but can give an example that is pretty unambiguous and about which I can't find a definitive Google statement.

Spiders. There are many self-identified spiders, from well known, reverse-IP trackable sources that crawl AdSense publisher sites and Google Search Network partners sites and click on the adverts. The signatures are as identifiable as the AdsBot or GoogleBot. Are these invalid clicks?

They look like a simple robot fraudulent click, but with clear signature and non-fraudulent intent. I've got client accounts where the sum total of invalid clicks is less than the sum total of double clicks and self-identified well known high volume spiders (think "Charlotte"). Admittedly the double click volume is estimated by us using a reverse inference on Google's techniques for measuring double clicks - but if my tiny operation can identify double clicks (gclid?), you guys can do so.

There should be no reason why Google can tell advertisers that self-identified robots are invalid clicks and list the robots that are excluded. Is there? I may have overlooked a reason that allows someone to gain advantage, but if the AdSense publisher makes no money on the bot, and the bot owner makes no money and the advertiser doesn't pay for them... The cult of secrecy simply makes Google look as if it has something to hide. It certainly raises *my* level of alert about whether sneakier activity is assiduously tracked, when obvious stuff isn't positively affirmed.

Cheers, JeremyC.

Jeremy Chatfield
December 14, 2006, 4:01AM


Thanks to everyone for the comments. Keep 'em coming! Rather than answer everyone's questions one at a time in comments, I'm going to address them in following posts. Let me know what you'd like to see me write about more.

Shuman, December 17, 2006, 5:29PM


90% percent of e-mail is detected as spam.

Why should clickfraud on partner sites(Adwords) be any less, when intentions are same.

Dumbledore
December 26, 2006, 7:46PM


Nice site actually. Gone to my favourites. Thanks for creation.

jack
January 08, 2007, 8:46PM


Shuman - I'd like to see you write about parked domains and why AdWords distributes ads to them on the Search network. I'm seeing some very bad traffic from these parked domain sites for a campaign that has the Content network turned off. The clicks are clearly *not* from active searchers, but passive browsers, clicking on links on a parked domain site. On the back end, your system will likely not detect this as click fraud because it's not automated. The trouble, though, is that this traffic is fraudulent because of the way Google distributes ads on the front end. Advertisers paying for clicks on the Search network should get traffic from people who actively type search keywords into a search box.

Perhaps you can convince Google to halt this practice? Why not be honest about these ads? Create a Domain network option for ad distribution in the AdWords system. Let advertisers opt in or out. At the very least, keep these parked domains on the Content network. The behavior of traffic from these sites is closer to Content network behavior than Search network behavior. Until something is done about this, no Google advertiser can be confident about their ad spend if they have the Search network box checked.

Shuman, what's your opinion on parked domains and how they should be handled in the AdWords system? Thanks.

Richard Ball
January 10, 2007, 8:15AM


Vint Cerf believes that 25% of all PCs could belong to botnets that spread spam, spyware, DDoS attacks, ... and click fraud.

CPCcurmudgeon
January 29, 2007, 7:31PM


I am sorry Shuman, but I am having a very hard time with what you say. All evidence we have point to the contrary. Let me give you two examples:
- On one of our advertiser's campaigns, 50% of the content clicks do not have JS enabled. What do you think of this?
- On average, for all visits, we have 64% originating in the US, however if we check only content, we go down to 34% for the US. Surprising, no?

Bernard Gallet
February 07, 2007, 12:51AM


Based on my site I'd say there is a huge difference between Google Search, and Google Content Networks.

The Google Content Networks returns higher than 95% invalid hits in my opinion - and suck my ad budget like no tomorrow. I've turned by content search bid to the lowest setting of 0.01.

Anyone who says that Content Networks have only 15% invalid and/or fraud clicks is full of crap.

I'd be more than happy to share my data to prove it.

J McDonald
June 16, 2007, 9:06AM



Links: Links to this site
My speaking schedule for early 2008
ClickFraud Might Be Up
Blogs By Googlers
Google & Yahoo lanceren Ad Traffic Quality Center
Click Fraud, Google AdWords and gclid
Long Overdue Blogroll Updates
Google Dishonors War Dead
 
Copyright © 2003-2008 Shuman Ghosemajumder. All contents available under a Creative Commons License. Opinions on this web site are the author's own. Generated Sunday, May 11th, 07:01:40 PM EST.