nicolas kruchten @ datacratic
TRANSCRIPT
RTB Optimizer:
Behind the scenes with
a Predictive API
Nicolas Kruchten
PAPIs.io – November 18, 2014
REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE
About Datacratic
• Software company specializing in
high performance systems and
machine learning
• 30 employees, founded in 2009,
based in Montréal, Québec, Canada with an office in New York
• 3 Predictive APIs in market today
• Building a Machine Learning Database to help others
build Predictive APIs and Apps
Real-Time Bidding for online advertising
Real-TimeExchange
Bidder
Bidder
Bidder
Bidder
WebBrowser
GET ad
bid requests
Real-Time Bidding for online advertising
Real-TimeExchange
Bidder
Bidder
Bidder
Bidder
WebBrowser
ad
bids
auction
Real-Time Bidding for online advertising
Real-TimeExchange
Bidder
Bidder
Bidder
Bidder
WebBrowser
This happens millions of times per second
Bidders must respond within 100 milliseconds
ad
bids
auction
Real-Time Bidding for online advertising
Real-TimeExchange
Bidder
Bidder
Bidder
Bidder
WebBrowser
RTB Optimizer enables bidders to achieve campaign goals
ad
bids
auction
Campaign goals
• Advertising campaigns are typically outcome-oriented
– Clicks
– Video views
– Conversions: app installs, purchases, sign-ups
• e.g. Ad network has sold someone 1,000 outcomes for $1,000
• e.g. Advertiser has $1,000 to get as many outcomes as
possible
• Essentially maximize profit or minimize cost-per-outcome
Datacratic’s RTB Optimizer
• Client bidder relays bid-requests to API, API tells it how to bid
• Handles 100,000 queries per second, for 100s of campaign
• API says which campaign should bid and how much
• API also needs outcomes in real-time and campaign goals
A Predictive API that learns
• Datacratic has no proprietary data set
• API can learn from scratch from the bid-request stream
what works for each campaign:
– Contextual features: website, time of day, banner size and placement
– User features: geo-location, browser, language, # of impressions shown
– Customer-provided data: about the user, about the website
• Provides insights into what features are driving performance
• Can re-use learnings from previous campaigns
Second price auctions
• First Price Auctions
– You bid $1, I bid $2: I win, and I pay $2
• RTB uses Second Price Auctions
– You bid $1, I bid $2: I win, and I pay $1
• Optimal bid = E[ value ]
– Say it’s worth $2 to me
– I will never bid more than $2
– If I bid $1.50 and you bid $1.75: I’ve lost an opportunity for $0.25 surplus!
– I should always bid $2
What’s it to you?
• If client gets paid $10,000 for 1,000 then payout = $10
E[ value | bid-request ] = $10 * P( conversion | bid-request )
• What was an economics problem is now a prediction
problem
• We need to calibrate to predict true probabilities
Collecting the data
• To compute P( X | Y ) we need examples of Y’s with an X label
• RTB Optimizer uses mix of strategies to meet campaign goals
• Probe strategy bids randomly to collect data
• Optimized strategy bids with E[ value]
• Automatic training/retraining when API see enough examples
Bias control
• Never stop the probe strategy
• Always need control group for evaluation, retraining
• Risk of filter bubbles: future models trained on previous output
• Bid requests are randomly routed to probe, less often over time
• Models automatically back-tested before deployment
How to learn in real-time
• Classify using bagged generalized linear models
• Generate non-linear features with statistics tables
• Periodically retrain classifier
• Continuously update stats tables
Statistics Table by example
Table Bucket Impressions OutcomesOutcomes/Impressions
95% Confidence Lower Bound on
Outcomes/Impressions
Browser
Chrome 5M 3k 0.060% 0.058%
Firefox 3M 1k 0.033% 0.031%
Website
abc.com 4M 2k 0.050% 0.048%
xyz.com 1k 10 1.000% 0.481%
RTB Optimizer
Probe
Bids API
E[ value ]
Training
Outcomes API
GLZ Classifier
Stats Tables
Real-Time
Batch
Implementation details (are everything)
• 100k requests per second, 10 millisecond latency, running
24/7,
1 trillion predictions to date
• Distributed system, written in C++ 11
• AWS: data in S3, training runs on Amazon EC2 spot market
• http://opensource.datacratic.com/
– RTBkit
– JML
– StarCluster
Does it work?
Classification success? ROC and calibration curves…
Optimization success? 80% reductions in cost-per-outcome…
Does it work?
Classification success? ROC or calibration curves…
Optimization success? 80% reductions in cost-per-outcome…
Customer success! 25% monthly growth