computational advertising
DESCRIPTION
Computational advertising. Kira Radinsky. Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti , Vanja Josifovski , in SDM 200. The Content Match Problem. Ads. Ads DB. Advertisers. - PowerPoint PPT PresentationTRANSCRIPT
Computational advertising
Kira Radinsky
Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, Vanja Josifovski, in SDM 200
The Content Match ProblemAds
Ads DB
Adv
ertis
ers
Ad Impression: Showing an add to a user
The Content Match ProblemAds
Ads DB
Adv
ertis
ers
Ad click: user click leads to revenue for ad server and content provider
(Click)
The Content Match ProblemAds
Ads DB
Adv
ertis
ers
The Content Match Problem: Match ads to pages to maximize clicks
(Click)
The Content Match ProblemAds
Ads DB
Adv
ertis
ers
Maximizing the number of clicks means:• For each webpage, find the ad with the best
Click-Through Rate (CTR)• But, without wasting too many impressions in learning this.
(Click)
Background: Bandits
𝑝1 𝑝2 𝑝3
Bandit “arms”
(Unknown payoff probabilities)
Pull arms sequentially so as to maximize the total expected reward• Estimate payoff probabilities • Bias the estimation process towards ‘better’ arms.
Background: Bandits Solutions
Try 1: Greedy solution:• Compute the sample mean of an arm ‘A’ by
dividing the total reward received from the arm by the number of times the arm has been pulled.
• At each time step – choose the arm with the highest sample mean.
Try 2: Naïve solution:• Pull each arm an equal number of timesEpsilon-greedy strategy:• The best bandit is selected for a propotion of of
the trials.• Another bandit is randomly selected (with
uniform probability) for a proportion of
Background: Bandits
pag
es ads
Web
page
1W
ebpa
ge2
Web
page
3
Bandit “arms”are ads
Background: BanditsW
ebpa
ges
AdsOne instance of the MAB problem
Unknown CTR
Content Match = A matrix• Each row is a bandit• Each cell has an
unknown CTR
Background: Bandits
Priority1
Bandit Policy:1. Assign Priority to each arm2. “Pull” arm with max priority
and observe reward3. Update priorities
Priority2 Priority3
Allocation
Estimation
Background: Bandits
Why not simply apply a bandit policy directly to the problem?• Converges too slowly with instances of MAB
and each bandit with arms per instance• Additional structure is available, we wish to
use it.
Multi-level PolicyAdsclasses
Webpagesclasses
Consider only two levels.
Multi-level PolicyApparel
Idea: CTRs in a block are homogeneous
Computers Travel Ad parent classes
Ad child classes
App
arel
Com
pute
rsTr
avel
Block
One MAB problem instance
Multi-level Policy
CTR in a block are homogeneous Used in allocation (picking ad for each
new page) Used in estimation (updating priorities
after each observation)
Multi-level Policy - Allocation
A C T
AC
T
? Page classifier
• Classify webpage page class, parent page class• Run bandit on ad parent classes pick one ad parent class• The two above steps results in a block
Multi-level Policy - Allocation
A C T
AC
T
? Page classifier
• Classify webpage page class, parent page class• Run bandit on ad parent classes pick one ad parent class• The two above steps results in a block• Run bandit among cells pick one ad class• (In general, continue from root to leaf final ad)
Multi-level Policy - Allocation
A C T
AC
T
? Page classifier
Bandits at higher levels:• Use aggregated information• Have fewer bandit arms Quickly figure out the best ad parent class
Multi-level Policy
CTR in a block are homogeneous Used in allocation (picking ad for each
new page) Used in estimation (updating priorities
after each observation)
Multi-level Policy - Estimation
CTR in a block are homogeneous Observations from one cell
also give information about others in the block.
How can we model this dependence?
A C T
AC
T
Multi-level Policy - Estimation
A C T
AC
T
Shrinkage Model
#clicks in cell
#impressions in cell
All cells in a block come from the same distribution
Multi-level Policy - Estimation
A C T
AC
T
• Intuitively, this leads to shrinkage of cell CTRs towards block CTRs
Estimated CTR
Beta prior (“block CTR”) Observed CTR
Experiments (S. Panday et al. 2007)
Root
20 nodes
221 nodes
~7000 nodes
Use this 2 levels
Depth 0
Depth 1
Depth 2
Depth 7
Taxonomy Structure
Experiments (S. Panday et al. 2007)
• Data collected over a 1 day period• Collected from only one server, under
some other ad-matching rules (not out bandit).
• ~229M impressions• CTR values have been linearly
transformed for purpose of confidentiality
Experiments (S. Panday et al. 2007)
Number of pulls
Clic
ks
Multi-level gives much higher #clicks!
Experiments (S. Panday et al. 2007)
Number of pulls
Mea
n-sq
uare
d E
rror
Multi-level gives much better MSE – it learnt more from its explorations.
Conclusions
• When having a CTR guided system, exploration is a key component.
• Short term penalty for the exploration needs to be limited (exploration budge)
• Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance)
• Exploration in a reduced dimensional space: class hirerchy
• Top down traversal of the hirerchy to determine the class of the ad to show