bandits for taxonomies: a model-based approach sandeep pandey deepak agarwal deepayan chakrabarti...

36
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Post on 19-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Bandits for Taxonomies: A Model-based

Approach

Sandeep Pandey

Deepak Agarwal

Deepayan Chakrabarti

Vanja Josifovski

Page 2: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem

Adv

ertis

ers

Ads DB

Ads

Ad impression: Showing an ad to a user

(click)

Page 3: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem

Adv

ertis

ersAds

Ad click: user click leads to revenue for ad server and content provider

Ads DB

(click)

Page 4: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem

Adv

ertis

ers

Ads DB

Ads

The Content Match Problem:

Match ads to pages to maximize clicks

Page 5: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem

Adv

ertis

ers

Ads DB

Ads

Maximizing the number of clicks means:

For each webpage, find the ad with the best Click-Through Rate (CTR),

but without wasting too many impressions in learning this.

Page 6: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Online Learning

Maximizing clicks requires:

Dimensionality reduction

Exploration

ExploitationBoth must occur together

Online learning is needed, since the system must continuously generate revenue

Page 7: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Taxonomies for dimensionality reduction

Root

Apparel Computers Travel

• Already exist

• Actively maintained

• Existing classifiers to map pages and ads to taxonomy nodes

Page/Ad

Learn the matching from page nodes to ad nodes dimensionality reduction

Page 8: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Online Learning

Maximizing clicks requires:

Dimensionality reduction

Exploration

Exploitation

Can taxonomies help in explore/exploit as well?

Taxonomy

?

Page 9: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Outline

Problem

Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Page 10: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Background: Bandits

Bandit “arms”

p1 p2 p3(unknown payoff

probabilities)

Pull arms sequentially so as to maximize the total expected reward

• Estimate payoff probabilities pi

• Bias the estimation process towards better arms

Page 11: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Background: BanditsW

ebp

age

1

Bandit “arms”

We

bpa

ge 2

We

bpa

ge 3

= ads

~106 ads

~109 pages

Page 12: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Background: BanditsAds

Web

page

s

Content Match = A matrix

• Each row is a bandit

• Each cell has an unknown CTR

One bandit

Unknown CTR

Page 13: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Background: Bandits

Bandit Policy

1. Assign priority to each arm

2. “Pull” arm with max priority, and observe reward

3. Update priorities

Priority 1

Priority 2

Priority 3

Allocation

Estimation

Page 14: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Background: Bandits

Why not simply apply a bandit policy directly to our problem?

• Convergence is too slow ~109 bandits, with ~106 arms per bandit

• Additional structure is available, that can help Taxonomies

Page 15: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Outline

Problem

Background: Multi-armed bandits

Proposed Multi-level Policy Experiments Related Work Conclusions

Page 16: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level PolicyAds

Webpages

… …

……

……

classes

classes

Consider only two levels

Page 17: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Consider only two levels

Tra

vel

Co

mp

u-

ters

Ap

pare

l

Ad parent classes

Ad child classes

Block

One bandit

Page 18: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Key idea: CTRs in a block are homogeneous

Ad parent classes

Block

One bandit

Tra

vel

Co

mp

u-

ters

Ap

pare

l Ad child classes

Page 19: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy

CTRs in a block are homogeneous Used in allocation (picking ad for

each new page) Used in estimation (updating

priorities after each observation)

Page 20: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy

CTRs in a block are homogeneous

Used in allocation (picking ad for each new page)

Used in estimation (updating priorities after each observation)

Page 21: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

C

A C T

AT

Multi-level Policy (Allocation)

?

Page classifier

Classify webpage page class, parent page class Run bandit on ad parent classes pick one ad parent

class

Page 22: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

C

A C T

AT

Multi-level Policy (Allocation)

Classify webpage page class, parent page class Run bandit on ad parent classes pick one ad parent

class Run bandit among cells pick one ad class In general, continue from root to leaf final ad

?

Page classifier

ad

Page 23: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

C

A C T

AT

ad

Multi-level Policy (Allocation)

Bandits at higher levels use aggregated information have fewer bandit arms Quickly figure out the best ad parent

class

Page classifier

Page 24: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy

CTRs in a block are homogeneous

Used in allocation (picking ad for each new page)

Used in estimation (updating priorities after each observation)

Page 25: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy (Estimation)

CTRs in a block are homogeneous Observations from one cell also

give information about others in the block

How can we model this dependence?

Page 26: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy (Estimation)

Shrinkage Model

Scell | CTRcell ~ Bin (Ncell, CTRcell)

CTRcell ~ Beta (Paramsblock)

# clicks in cell# impressions

in cell

All cells in a block come from the same distribution

Page 27: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Multi-level Policy (Estimation)

Intuitively, this leads to shrinkage of cell CTRs towards block CTRs

E[CTR] = α.Priorblock + (1-α).Scell/Ncell

Estimated CTR

Beta prior (“block CTR”)

Observed CTR

Page 28: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Outline

Problem

Background: Multi-armed bandits

Proposed Multi-level Policy

Experiments Related Work Conclusions

Page 29: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Experiments

Root

20 nodes

221 nodes…

~7000 leaves

Taxonomy structure

We use these 2 levels

Depth 0

Depth 7

Depth 1

Depth 2

Page 30: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Experiments

Data collected over a 1 day period Collected from only one server, under some

other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed

for purposes of confidentiality

Page 31: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Experiments (Multi-level Policy)

Multi-level gives much higher #clicks

Number of pulls

Clic

ks

Page 32: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Experiments (Multi-level Policy)

Multi-level gives much better Mean-Squared Error it has learnt more from its explorations

Mea

n-S

quar

ed E

rror

Number of pulls

Page 33: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Experiments (Shrinkage)

Number of pulls Number of pullsMea

n-S

quar

ed E

rror

Clic

ks without shrinkage

with shrinkage

Shrinkage improved Mean-Squared Error, but no gain in #clicks

Page 34: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Outline

Problem

Background: Multi-armed bandits

Proposed Multi-level Policy

Experiments

Related Work Conclusions

Page 35: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Related Work

Typical multi-armed bandit problems Do not consider dependencies Very few arms

Bandits with side information Cannot handle dependencies among ads

General MDP solvers Do not use the structure of the bandit problem Emphasis on learning the transition matrix, which

is random in our problem.

Page 36: Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Conclusions

Taxonomies exist for many datasets They can be used for

Dimensionality Reduction Multi-level bandit policy higher #clicks Better estimation via shrinkage models better

MSE