adpredictor - large scale bayesian click-through rate prediction in microsoft's bing search engine

Click Prediction with adPredictor

at Microsoft Advertising

Joaquin Quionero Candela Thore Graepel

Ralf Herbrich

Thomas Borchert

Microsoft Research & Microsoft adCenter

Microsoft + Yahoo! = 1/3 US search market adPredictor predicts probability of click on ads for Microsoft Bing and Yahoo! search engines

flowers

$1.00

$2.00

$0.10

* 10%

* 4%

* 50%

=$0.10

=$0.08

=$0.05

$0.80

$1.25

$0.05

Efficient use of ad space

Increased user satisfaction by better targeting

Increased revenue by showing ads with high click-thru rate

Importance of accurate probability estimates

Over-simplified ranking function: this is not what is used in practice

Impression Level Predictions

Sparse binary input features (many 10s of them)

Some high cardinality (~100M), some low (

Sparse Linear Probit Regression

1341201

1570165

2213187

9215433

Ad ID

Exact Match

Broad Match

Match Type

Position

ML-1

SB-1

SB-2

+ pClick

Uncertainty: A Bayesian Treatment

1341201

1570165

2213187

9215433

Ad ID

Exact Match

Broad Match

Match Type

Position

ML-1

SB-1

SB-2

p(pClick) +

A Linear Probit Model

Notation = 1 if click is the vector of all weights = 1 if non-click is a sparse binary input vector

Generalised linear model with weights vector :

|, :=

Inverse link function is the probit function:

; 0,1

controls the steepness: it corresponds to the standard deviation of additive zero mean noise.

9

0

click no click

|, :=

Observation Noise (Assume Known Noiseless Weights)

10

Think of as indicator variables that select weights: we will soon remove from the notation Example = = [1; 0; 0; 0; 1; 0; ; 0; 1]

Uncertainty About the Weights A Bayesian Treatment

Factorizing Gaussian prior over the weights:

= ; , 2

=1

Given (|,) the posterior is given by:

|, = |,

|, d

Problem: This posterior cannot be represented compactly nor calculated in closed form

11

Desiderata and Approximations

We want

The posterior to remain a factorized Gaussian

Incremental online learning rather than batch

This is how it is done

Approximate inference with latent variables

Single pass approximate (online) schedule

12

Sum of posterior weights 0

Predicting Average Probability of Click

Now that our posterior over the weights is a factorizing Gaussian

100%

=

=1

2 + 2

=1

Principled Exploration

0

1

2

3

4

5

6

7

8

9

10

0% 20% 40% 60% 80% 100%

p(p

Clic

k)

pClick

average: 25% (3 clicks out of 12 impressions)

average: 30% (30 clicks out of 100 impressions)

Approximate Inference with Latent Variables

Prior: = ; , 2

Sum of active weights:

, = =1

Noisy version thereof: , = ; , 2

The sign of determines click: , = sign

15

1

1

Approximating () and ()

-5 0 5 -5 0 5

-5 0 5 -5 0 5 -5 0 5

-5 0 5

* =

* =

() () ()

() () ()

Updating the Posterior

No

Clic

k

Clic

k

w1 w2 +

s

y Prediction Training/Update

Posterior Updates for the Click Event

The importance of joint updates

0.01%

0.10%

1.00%

10.00%

100.00%

0.01% 0.10% 1.00% 10.00% 100.00%

Act

ual

CTR

Predicted CTR

adPredictor

0.01%

0.10%

1.00%

10.00%

100.00%

0.01% 0.10% 1.00% 10.00% 100.00%

Act

ual

CTR

Predicted CTR

Naive Bayes

Calibration by Isotonic Regression

0.01%

0.10%

1.00%

10.00%

100.00%

0.01% 0.10% 1.00% 10.00% 100.00%

Act

ual

CTR

Predicted CTR

Calibrated adPredictor

0.01%

0.10%

1.00%

10.00%

100.00%

0.01% 0.10% 1.00% 10.00% 100.00%

Act

ual

CTR

Predicted CTR

Calibrated Naive Bayes

Calibration Cant Improve the ROC

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Tru

e P

osi

tive

s

False Positives

Nave Bayes

adPredictor

adPredictor Wrap Up

Automatic learning rate

Calibrated: 2% prediction means 2% clicks

Use of very many features, even if correlated

Modelling the uncertainty explicitly

Natural exploration mode

Discussion (For Later)

Sample selection bias and exploration Dynamics: forgetting with time Pruning uninformative weights Approximate parallel inference Hierarchical priors Input features the secret sauce Some of this is detailed in the ICML 2010 paper: Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsofts Bing Search Engine

We are hiring! Please contact me if you are interested.

Thank you!

[email protected]

We are hiring! Please contact me if you are interested.

adpredictor - large scale bayesian click-through rate prediction in microsoft's bing search engine

Documents

high click

sum of posterior weights

weights vector

sum of active weights

known noiseless weights

microsoft bing

probit function

sparse binary input