cross device ad targeting at scale

24
Cross Device Ad Targeting at Scale jerry ye Copyright © LSRS 2013 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.

Upload: trieu-nguyen

Post on 01-Jul-2015

202 views

Category:

Marketing


0 download

TRANSCRIPT

Page 1: Cross Device Ad Targeting at Scale

Cross Device Ad Targeting at Scale

jerry ye

Copyright © LSRS 2013 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.

Page 2: Cross Device Ad Targeting at Scale

Agenda

•  Cross Device Ad Targeting •  Overview •  Infrastructure

•  Ad CTR, CVR, Revenue prediction •  Data vs. Algorithm •  Machine learning on Hadoop

•  MapReduce •  MPI

•  Case Study: GBDT •  Large scale machine learning at Drawbridge •  Vowpal Wabbit •  Results

Page 3: Cross Device Ad Targeting at Scale

Sequoia Capital, Kleiner Perkins, and Northgate Capital Funded $20.5MM raised Demand Side Partner for Cross Device Ad Targeting Over 475MM devices probabilistically matched Non Personally Identifiable Information Reach users on any device

Confidential and Proprietary. Copyright © Drawbridge, 2013.

Page 4: Cross Device Ad Targeting at Scale

Matching Devices

Page 5: Cross Device Ad Targeting at Scale

After bridging devices, generate user profile:"•  Non-personally identifiable information"•  Demographic"•  Location"•  Interests"•  Retargeting"•  Bridged devices"""

User Profile

Page 6: Cross Device Ad Targeting at Scale

Drawbridge Infrastructure

Page 7: Cross Device Ad Targeting at Scale

Drawbridge is a Demand Side Partner"Supply comes from RTB exchanges as well as direct partnerships"Ability to target desktop segments on mobile and vice versa""Given a request, within milliseconds:"•  select campaign"•  price"•  place bid"

Optimize for"•  Client ROI"•  Revenue/Profit""

Supply and Demand

Page 8: Cross Device Ad Targeting at Scale

In the case of revenue, given a request, predict:""Cost Per Click:"Predicted Revenue = pctr*cpc""Cost Per Acquisition:"Predicted Revenue = pctr*pcvr*cpa""Given revenue, pick bidding strategy"

Prediction

Page 9: Cross Device Ad Targeting at Scale

Data

Months of historic ad performance data"–  Click/Conversion"

Per impression data"–  Temporal"–  Contextual"–  Anonymized User Features"

Training Data"–  1.5 TB"–  1MM dimensions"–  2B samples"–  Unsampled, trained on everything"

Page 10: Cross Device Ad Targeting at Scale

Advantage of More Data

0  

0.2  

0.4  

0.6  

0.8  

1  

Indexed  Click  Through  Rate  

60  Days  

30  Days  

6.6% Increase in CTR from doubling data"

Page 11: Cross Device Ad Targeting at Scale

Data versus Algorithm

More data?"Better algorithm?""Why not both?"Difficult to scale ML"Most use R, Matlab, Weka""Hadoop? "

Page 12: Cross Device Ad Targeting at Scale

•  Hadoop is an open source implementation of MapReduce"•  MapReduce designed for data intensive applications"•  Relatively simple to use"•  Wide range of applications"•  Streaming or MapReduce"•  No means of communication"•  Great for data prep"

Map  0   Map  1   Map  2  

Reduce  0   Reduce  1  

Out  0   Out  1  

Shuffle

Input  

MapReduce

Page 13: Cross Device Ad Targeting at Scale

Scaling Up Machine Learning

Most ML is iterative (gradient descent)"Optimizes for a set of parameters over the same data"Updated parameters are usually aggregated""Naïve ML implementations in MapReduce do not work"Case Study: Gradient Boosted Decision Trees at Yahoo!"

Page 14: Cross Device Ad Targeting at Scale

•  Gradient Boosted Decision Trees was introduced by Jerome Friedman in 1999"

•  An additive regression model over an ensemble of trees, fitted to current residuals, gradients of the loss function, in a forward step-wise manner "

•  Favors many shallow trees (e.g., 6 nodes, 2000 trees)"•  Numerous applications within Yahoo!"•  Blender in Bellkor’s winning Netflix solution"

+   +  +    …  

Gradient Boosted Decision Trees

Page 15: Cross Device Ad Targeting at Scale

•  High memory tasks"•  HDFS lag, experience with GBDT"•  Certain tasks would get performance bump if written with MPI"•  Yahoo! has invested heavily in Hadoop"•  Multiple research and production clusters with thousands of machines

each"•  Expensive to deploy MPI specific clusters"•  Utilize existing Hadoop clusters for MPI jobs"

Motivation

Page 16: Cross Device Ad Targeting at Scale

•  Message Passing Interface (MPI) allows many computers to communicate with each other."

•  Dominant model in high performance computing"•  Scalable, portable"•  Distributed shared memory for high RAM jobs"•  OpenMPI is an open source implementation of MPI"•  Low level and can be complicated to use"•  Modified OpenMPI to run on Hadoop"•  Fault tolerance"

Master   Slave  1   Slave  2  

Out  

Input  

Message Passing Interface

Page 17: Cross Device Ad Targeting at Scale

•  MPI implementation of GBDT magnitudes faster than horizontal MapReduce implementation"

•  MPI implementation 6X faster than vertical MapReduce implementation"•  Communications cost dramatically reduced"•  MPI close to ideal initially, but communications not free"•  MapReduce vs. MPI implementation on left"•  Scalability of MPI implementation on right"

Speedup

Page 18: Cross Device Ad Targeting at Scale

+   +  +    …  

+   +  +    …  

Horizontal: 211 minutes x 2500 trees = 366 days x 100 machines"Vertical: 28 seconds x 2500 trees = 19.4 hours x 20 machines "

5 seconds x 2500 trees = 3.4 hours x 10 machines "

1800% less node hours!"

MapReduce"

MPI  

GBDT Improvement

Page 19: Cross Device Ad Targeting at Scale

•  Pig, Hive, Hbase"•  Clean data, abuse detection"•  Missing values"•  Normalization"•  Binning"•  Formatting"•  Vowpal Wabbit"

Machine Learning @ Drawbridge

Page 20: Cross Device Ad Targeting at Scale

•  John Langford at Microsoft Research (previously Yahoo! Labs)"•  http://hunch.net/~vw"•  State of the art in scalable, fast, efficient machine learning.

Magnitudes faster and more scalable than any MapReduce based learner"

•  VW has been shown to reliably scale to 1000+ machines."•  AllReduce implementation similar to MPI’s"•  Runs on Hadoop"•  @Drawbridge, we predict ctr, cvr, win rate"

Vowpal Wabbit

Page 21: Cross Device Ad Targeting at Scale

•  Predict probability of click/conversion [0,1]"•  Solve for θ"

"

f (x) = 11+ e−z

z = β + θi xii=0

n

Logistic Regression

Page 22: Cross Device Ad Targeting at Scale

•  Θ are weights for our features"•  How important is"

– User frequency cap?"– Time of day?"– Device?"– Etc"

z(x) = β +θ0 (user frequency count) +θ1(Sunday)+θ2 (iPhone)

What is Theta?

Page 23: Cross Device Ad Targeting at Scale

•  Logistic Regression vs. Method of Moments"•  Bucket tests on percentage of traffic"•  Several models released since July of last year"•  10-30% improvement in CTR, CVR each time"•  Lower cost, higher revenue"

A/B Testing

Page 24: Cross Device Ad Targeting at Scale

•  Cross Device Ad Targeting"•  Ad CTR, CVR, Revenue prediction"•  Data AND algorithm"•  Machine learning on Hadoop"

•  MapReduce"•  MPI"•  GBDT much faster in MPI"

•  Large scale machine learning at Drawbridge"•  Vowpal Wabbit"

Conclusions