Introduction to the Special Issue: Click Fraud

Jan O. Pedersen
International Journal of Electronic Commerce,
Volume 13, Number 2, Winter 2008-09, pp. 5.


Sponsored Search owes much of its success to the pay-per-click performance advertising model. In traditional advertising, product advertisers target an audience and pay for each impression. In Sponsored Search, advertisers target advertisements at search keywords, but only pay if a user actually engages by clicking on the offered link. The close coupling between payment and an easily measurable performance metric creates an unrivaled performance marketing environment; advertisers can know with great exactness the effectiveness of their campaigns.

The pay-per-click model originated in 1998 in the first Sponsored Search product from GoTo as a way to attract risk-averse small advertisers who were eager to drive traffic to their sites but unwilling to accept the risks and opacity of the traditional pay-per-impression advertising model [2]. It later became evident that this type of advertising, when combined with user-intent targeting in the form of search keywords and optimization via retrospective analytics, was an incredible value-generation machine. GoTo was one of the few runaway successes to emerge from the Internet winter of the turn-of-the-century. Its product and business model were later adopted and improved upon by Google in the Adwords product (first launched in 2000), which currently is the foundation of Google’s financial success.

In Sponsored Search, advertisers provide an ad (title, abstract, and landing-page URL) and bid on keywords (search terms) that they believe will yield value-generating traffic for their site. When a consumer issues a search, the publisher (a search engine) generates a set of candidate ads (i.e., ads whose bid keywords match the consumer’s query), ranks them taking the bid into account, and presents them to the consumer as part of the search results page. If the consumer clicks on an ad, that event is logged and the advertiser is later billed for the generated traffic. Through log analysis, the aggregate traffic by keyword source driven by a sponsored search campaign can be retrospectively analyzed along with the attendant costs and generated revenue. Keyword selection and bids can then be adjusted to maximize profit or other campaign objectives.

In greater detail, assume that k ads match a given query, and let ai be the ith ad and bi be the bid for the ith ad. The publisher will compute q(ai), the quality of ad ai with respect to the query, and rank the ads based on

q(ai) x bi……………………………….. (1)

The higher the placement (or rank) of an ad, the more clicks it will yield. This is because users scan result lists from top to bottom with a presumption that higher-placed results are more relevant. Hence increasing a bid will tend to increase an ad’s rank, yielding more clicks, but at a higher cost. Let π j denote the index of the ad in the jth rank (with j = 1 being the highest rank). The placement effect is typically modeled by presuming that the probability a user will click on an ad has two independent components α j, a rank-dependent constant, and p(a πj), an ad-dependent but rank-independent intrinsic clickability. If q(a πj) is thought of as an estimate of p(a πj), then Equation (1) will rank ads based on the expected revenue per impression. If the total value of a ranking is expressed as

………………(2)

then ranking via Equation (1) will also maximize total value.

Inspired by the Vickrey-Clark-Groves auction, which specifies a pricing formula for which bidding one’s true value is the optimal strategy, clicks are priced through a second-bid pricing mechanism [1, 3].1 The price of a click on the jth position is

………… (3)

Given ranking formula (1), this can be thought of as the minimum bid required for advertiser πj to maintain position j. Hence advertisers never pay more than their bids, and prices are determined by the quality of the ad and the competition for the keyword.

An advertiser can maximize the payoff for a given keyword by incrementally adjusting the bid. For a given ad and a given bid, b, the ranking and pricing mechanism outlined above will generate some click rate, c(b), at some price p(b). The value of the generated traffic is contingent on the likelihood of a click ultimately converting to some advertiser-specific revenue event in the future (e.g., a purchase transaction). The conversion likelihood, Β, and the value of the downstream revenue event, v, together determine the value of this traffic to the advertiser:

profit = (Ic(b))(Βv – p(b)) …………………………………..(4)

where I is the impression rate for the given keyword. As the bid, b, increases, c(b) increases, as does the price p(b). If b = 0, then c(b) = 0 and profit is zero. Further, as long as p(b) The advertiser profit-maximizing strategy described above assumes that the advertiser can assess the downstream value of generated traffic and that the variables v and Β, and the functions c(b) and p(b) are relatively stationary over time. Normally, value is assessed retrospectively on average over some time period, so these quantities are estimated in expectation of smoothing out random and periodic variations, whereas systematic time trends are detected with lag. For example, should the downstream value of generated traffic suddenly drop, bids will be suboptimal and may actually yield a loss for some time until they can be readjusted. The lag between value assessment and billing (which is on-line and ongoing) is the lever exploited by various fraudulent techniques, collectively referred to as Sponsored Search click fraud.

This Special Issue of IJEC explores some of the issues in Sponsored Search click fraud, ranging from a taxonomy of its different forms to models of its impact on advertiser behavior and adjustments to the auction mechanism to mitigate its effects. It is useful to note that click fraud that looks like a random fluctuation to the advertiser or like a gradual change in value (or equivalently a change in traffic quality) is more of a concern to the auctioneer than the advertiser, since the advertiser can adjust to slowly changing values by modifying bids. The more pernicious form is a sudden, targeted change in value designed to disrupt an advertising campaign. Sponsored Search networks have developed click fraud detection techniques whose purpose is to protect advertisers from this sort of attack; they err on the side of safety by discarding all suspicious clicks as fraudulent. So, although click fraud is an unfortunate phenomenon that requires constant attention, it is not as great a threat to this incredible value-creating marketplace as some suspect and fear.

NOTE

1. The second-bid pricing mechanism is not equivalent to the Vickrey-Clark-Groves mechanism, and it is sometimes possible for bidders to benefit by bidding values other than their true value. However, the mechanism is more robust toward strategic behavior than a pay-as-you-bid mechanism. See [1] for a discussion.

REFERENCES

1. Edelman, B.; Ostrovsky, M.; and Schwarz, M. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review,97, 1 (March 2007), 242–259.

2. Fain, D.C., and Pedersen, J.O. Sponsored search: A brief history. ASIS Bulletin, Special Section (December 2005/January 2006), www.asis.org/Bulletin/Dec-05/pedersen.html.

3. Varian, H. Position auctions. International Journal of Industrial Organization,25, 6 (2007), 1163–1178.

JAN O. PEDERSEN (pederse@yahoo-inc.com) is chief scientist for search and marketplace at Yahoo! Dr. Pedersen began his career at Xerox PARC, where he led a research group investigating information-access technologies. In 1996 he joined Verity, the enterprise search software vendor, as manager of the Advanced Technology Group. In 1998, Dr. Pedersen joined Infoseek as director for search and spidering. In 2002, he joined AltaVista as chief scientist. AltaVista was purchased by Overture, which was in turn acquired by Yahoo! Jan Pedersen holds a Ph.D. in statistics from Stanford University and a B.A. in statistics from Princeton University. He is credited with more than 10 issued patents and has authored more than 20 refereed publications on information-access topics.