conductrics bandit basicsemetrics1016
TRANSCRIPT
Conductrics twitter: @mgershoff
Bandit Basics – A Different
take on Online Optimization
Who is this guy?
Matt Gershoff
CEO: Conductrics
Many Years in Database Marketing (New York
and Paris)
and a bit of Web Analytics
www.conductrics.com
twitter:@mgershoff Email:[email protected]
Speak Up
Conductrics twitter: @mgershoff
What Are We Going to Hear?
• Optimization Basics
• Multi-Armed Bandit
• Its a Problem, Not a Method
• Some Methods
• AB Testing
• Epsilon Greedy
• Upper Confidence Interval (UCB)
• Some Results
Choices Targeting
Learning Optimization
Conductrics twitter: @mgershoff
Conductrics twitter: @mgershoff
OPTIMIZATION
If THIS Then THAT ITTT brings together:
1.Decision Rules
2.Predictive Analytics
3.Choice Optimization
Conductrics twitter: @mgershoff
If THIS Then THAT
OPTIMIZATION
Find and Apply the Rule with the most Value
If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT
If THIS Then THAT
If THIS Then THAT
If THIS Then THAT
If THIS Then THAT If THIS Then THAT
If THIS Then THAT
If THIS Then THAT
Conductrics twitter: @mgershoff
OPTIMIZATION
If Then Facebook
High Spend
Urban GEO
.
.
.
.
.
Home Page
App Use
Offer A
Offer B
Offer C
.
.
.
.
.
Offer Y
Offer Z
Variables whose Values
Are Given to You Variables whose Values
You Control
F1
F2
Fm
S Valuei
Predictive Model
THIS THAT
Inputs Outputs
Conductrics twitter: @mgershoff
But …
Offer A ?
Offer B ?
Offer C ?
.
.
.
.
.
Offer Y ?
Offer Z ?
THAT
1. We Don’t Have Data on ‘THAT’
2. Need to Collect – Sample
3. How to Sample Efficiently?
Where
Marketing Applications:
• Websites
• Mobile
• Social Media Campaigns
• Banner Ads
Pharma: Clinical Trials
Conductrics twitter: @mgershoff
What is a Multi Armed Bandit
A
OR
B
One Armed Bandit –>Slot Machine
Conductrics twitter: @mgershoff
The problem:
How to pick between Slot Machines so that
you walk out with most $$$ from Casino at the
end of the Night?
Objective
Pick so as to get the most
return/profit as you can
over time
Technical term: Minimize Regret
Conductrics twitter: @mgershoff
… but how to Pick?
A
OR
B
Sequential Selection
Conductrics twitter: @mgershoff
Need to Sample, but do it efficiently
Explore – Collect Data
• Data Collection is costly – an Investment
• Be Efficient – Balance the potential value of
collecting new data with exploiting what
you currently know.
A
OR
B
Conductrics twitter: @mgershoff
Multi-Armed Bandits
“Bandit problems embody in essential
form a conflict evident in all human
action: choosing actions which yield
immediate reward vs. choosing actions
… whose benefit will come only later.”*
- Peter Whittle
*Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.
Conductrics twitter: @mgershoff
Exploration Exploitation
1) Explore/Learn – Try out different actions
to learn how they perform over time – This is
a data collection task.
2) Exploit/Earn – Take advantage of what
you have learned to get highest payoff –
Your current best guess
Conductrics twitter: @mgershoff
Not A New Problem
1933 – first work on competing
options
1940 – WWII Problem Allies
attempt to tackle
1953 – Bellman formulates as a
Dynamic Programing problem
Source: http://www.lancs.ac.uk/~winterh/GRhist.html
Conductrics twitter: @mgershoff
Testing
• Explore First
–All actions have an equal chance of
selection (uniform random).
–Use hypothesis testing to select a
‘Winner’.
• Then Exploit - Keep only ‘Winner’
for selection
Conductrics twitter: @mgershoff
Learn First
Conductrics twitter: @mgershoff
Time
Explore/
Learn
Exploit/
Earn
Data Collection/Sample Apply Leaning
P-Values: A Digression
P-Values:
• NOT the probability that the Null is
True. P( Null=True| DATA)
• P(DATA (or more extreme)| Null=True)
• Not a great tool for deciding when to
stop sampling See:
http://andrewgelman.com/2010/09/noooooooooooooo_1/
http://www.stat.duke.edu/~berger/papers/02-01.pdf
Conductrics twitter: @mgershoff
A Couple Other Methods
1. Epsilon Greedy
Nice and Simple
2. Upper Confidence Bounds(UCB)
Adapts to Uncertainty
Conductrics twitter: @mgershoff
1) Epsilon-Greedy
Conductrics twitter: @mgershoff
Greedy
What do you mean by ‘Greedy’?
Make whatever choice seems
best at the moment.
Epsilon Greedy
• Explore – randomly select action
percent of the time (say 20%)
• Exploit – Play greedy (pick the
current best) 1- (say 80%)
What do you mean by ‘Epsilon
Greedy’?
Epsilon Greedy User
Select
Randomly
Like AB Testing
Select Current
Best
(Be Greedy)
Explore/Learn
(20%)
Exploit/Earn
(80%)
Conductrics twitter: @mgershoff
Epsilon Greedy
Action Value
A $5.00
B $4.00
C $3.00
D $2.00
E $1.00
80% Select Best 20% Random
Conductrics twitter: @mgershoff
Continuous Sampling
Conductrics twitter: @mgershoff
Time
Explore/Learn
Exploit/Earn
Epsilon Greedy
–Super Simple/low cost to implement
–Tends to be surprisingly effective
–Less affected by ‘Seasonality’
–Not optimal (hard to pick best )
–Doesn’t use measure of variance
–Should/How to decrease Exploration over
time? Conductrics twitter: @mgershoff
Upper Confidence Bound
Basic Idea:
1) Calculate both mean and a measure
of uncertainty (variance) for each
action.
2) Make Greedy selections based on
mean + uncertainty bonus
Conductrics twitter: @mgershoff
Confidence Interval Review
Confidence Interval = mean +/- z*Std
Mean - 2*Std +2*Std
Conductrics twitter: @mgershoff
Upper Confidence
Mean +Bonus
Score each option using the upper
portion of the interval as a Bonus
Conductrics twitter: @mgershoff
Upper Confidence Bound
$0
Estimated Reward
$5 $10
A
B
C
1) Use upper portion of CI as ‘Bonus’
Select A
Conductrics twitter: @mgershoff
2) Make Greedy Selections
Upper Confidence Bound
$0 Estimated Reward
$5 $10
A
B
C
1) Selecting Action ‘A’ reduces uncertainty
bonus (because more data)
Select C
Conductrics twitter: @mgershoff
2) Action ‘C’ now has highest score
Conductrics twitter: @mgershoff
Upper Confidence Bound
• Like A/B Test – uses variance
measure
• Unlike A/B Test – no hypothesis
test
• Automatically Balances
Exploration with Exploitation
Conductrics twitter: @mgershoff
Treatment
Conversion
Rate Served
V2V3 9.9% 14,893
V2V2 9.7% 9,720
V2V1 8.0% 2,441
V1V3 3.3% 2,090
V2V3 2.6% 1,849
V2V2 2.0% 1,817
V1V1 1.8% 1,926
V3V1 1.8% 1,821
V1V2 1.5% 1,873
Case Study:
Conductrics twitter: @mgershoff
Case Study
Test Method Conversion
Rate
Adaptive 7%
Non Adaptive 4.5%
AB Testing V Bandit
Conductrics twitter: @mgershoff
Option A ->
Option B ->
Option C ->
Why Should I Care?
Conductrics twitter: @mgershoff
• More Efficient Learning
• Automation
• Changing World
Questions?
Conductrics twitter: @mgershoff
Conductrics twitter: @mgershoff
Matt Gershoff
p) 646-384-5151
t) @mgershoff
Thank You!