adpredictor - large scale bayesian click-through rate prediction in microsoft's bing search engine

24
Click Prediction with adPredictor at Microsoft Advertising Joaquin Quiñonero Candela Thore Graepel Ralf Herbrich Thomas Borchert Microsoft Research & Microsoft adCenter

Upload: rayssa-kuellian

Post on 26-Nov-2015

133 views

Category:

Documents


2 download

DESCRIPTION

cx

TRANSCRIPT

  • Click Prediction with adPredictor

    at Microsoft Advertising

    Joaquin Quionero Candela Thore Graepel

    Ralf Herbrich

    Thomas Borchert

    Microsoft Research & Microsoft adCenter

  • Microsoft + Yahoo! = 1/3 US search market adPredictor predicts probability of click on ads for Microsoft Bing and Yahoo! search engines

  • flowers

  • $1.00

    $2.00

    $0.10

    * 10%

    * 4%

    * 50%

    =$0.10

    =$0.08

    =$0.05

    $0.80

    $1.25

    $0.05

    Efficient use of ad space

    Increased user satisfaction by better targeting

    Increased revenue by showing ads with high click-thru rate

    Importance of accurate probability estimates

    Over-simplified ranking function: this is not what is used in practice

  • Impression Level Predictions

    Sparse binary input features (many 10s of them)

    Some high cardinality (~100M), some low (

  • Sparse Linear Probit Regression

    1341201

    1570165

    2213187

    9215433

    Ad ID

    Exact Match

    Broad Match

    Match Type

    Position

    ML-1

    SB-1

    SB-2

    + pClick

  • Uncertainty: A Bayesian Treatment

    1341201

    1570165

    2213187

    9215433

    Ad ID

    Exact Match

    Broad Match

    Match Type

    Position

    ML-1

    SB-1

    SB-2

    p(pClick) +

  • A Linear Probit Model

    Notation = 1 if click is the vector of all weights = 1 if non-click is a sparse binary input vector

    Generalised linear model with weights vector :

    |, :=

    Inverse link function is the probit function:

    ; 0,1

    controls the steepness: it corresponds to the standard deviation of additive zero mean noise.

    9

  • 0

    click no click

    |, :=

    Observation Noise (Assume Known Noiseless Weights)

    10

    Think of as indicator variables that select weights: we will soon remove from the notation Example = = [1; 0; 0; 0; 1; 0; ; 0; 1]

  • Uncertainty About the Weights A Bayesian Treatment

    Factorizing Gaussian prior over the weights:

    = ; , 2

    =1

    Given (|,) the posterior is given by:

    |, = |,

    |, d

    Problem: This posterior cannot be represented compactly nor calculated in closed form

    11

  • Desiderata and Approximations

    We want

    The posterior to remain a factorized Gaussian

    Incremental online learning rather than batch

    This is how it is done

    Approximate inference with latent variables

    Single pass approximate (online) schedule

    12

  • Sum of posterior weights 0

    Predicting Average Probability of Click

    Now that our posterior over the weights is a factorizing Gaussian

    100%

    =

    =1

    2 + 2

    =1

  • Principled Exploration

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0% 20% 40% 60% 80% 100%

    p(p

    Clic

    k)

    pClick

    average: 25% (3 clicks out of 12 impressions)

    average: 30% (30 clicks out of 100 impressions)

  • Approximate Inference with Latent Variables

    Prior: = ; , 2

    Sum of active weights:

    , = =1

    Noisy version thereof: , = ; , 2

    The sign of determines click: , = sign

    15

    1

    1

  • Approximating () and ()

    -5 0 5 -5 0 5

    -5 0 5 -5 0 5 -5 0 5

    -5 0 5

    * =

    * =

    () () ()

    () () ()

  • Updating the Posterior

    No

    Clic

    k

    Clic

    k

    w1 w2 +

    s

    y Prediction Training/Update

  • Posterior Updates for the Click Event

  • The importance of joint updates

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    adPredictor

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Naive Bayes

  • Calibration by Isotonic Regression

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Calibrated adPredictor

    0.01%

    0.10%

    1.00%

    10.00%

    100.00%

    0.01% 0.10% 1.00% 10.00% 100.00%

    Act

    ual

    CTR

    Predicted CTR

    Calibrated Naive Bayes

  • Calibration Cant Improve the ROC

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    90.00%

    100.00%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

    Tru

    e P

    osi

    tive

    s

    False Positives

    Nave Bayes

    adPredictor

  • adPredictor Wrap Up

    Automatic learning rate

    Calibrated: 2% prediction means 2% clicks

    Use of very many features, even if correlated

    Modelling the uncertainty explicitly

    Natural exploration mode

  • Discussion (For Later)

    Sample selection bias and exploration Dynamics: forgetting with time Pruning uninformative weights Approximate parallel inference Hierarchical priors Input features the secret sauce Some of this is detailed in the ICML 2010 paper: Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsofts Bing Search Engine

    We are hiring! Please contact me if you are interested.

  • Thank you!

    [email protected]

    We are hiring! Please contact me if you are interested.