hierarchical exploration for accelerating contextual bandits

1
Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue, Sue Ann Hong and Carlos Guestrin Personalized Recommender Systems Every day, user visits news portal Wish to personalize to her preferences Can only learn from feedback E.g., user clicks on or “likes” article Leads to exploration vs exploitation dilemma Goal is to satisfy user Must make exploratory recommendations to learn Linear Stochastic Bandit Problem Balancing Exploration vs. Exploitation CoFineUCB: Coarse-to-Fine Hierarchical Exploration Feature Hierarchies Suppose “stereotypical users” span K- dimensional space E.g., “European vs. Asian news” Let U = D x K matrix Define projection of articles into subspace: Define representation of user profile: Thus: News Recommender Simulations & User Study Two tiered exploration: First in subspace Then in full space Theorem: with probability 1- δ average bounded by Comparison Win / Tie / Loss Gain / Day CoFineUCB vs. Naïve 24 / 1 / 3 0.69 CoFineUCB vs. Reshaped 21 / 3 / 6 0.27 Mean Estimate by Topic Uncertainty of Estimate + At each iteration t: Set of available actions X t = {x t,1 , …, x t,n } (available articles) Algorithm chooses action x t from X t (recommends an article) User provides feedback ŷ t (user clicks on or “likes” the article) Algorithm incorporates feedback Assumptions: E[ŷ t ] = w *T x t (w * is unknown to system) Regret: At each iteration: In example below: select article on economy: Uncertainty Estimated Gain “Upper Confidence Bound” Given empirical sample of learned profiles W Can also be used to reshape full space (use LearnU(W,D)) Constructing Feature Hierarchies Using Prior Knowledge “Atypical Users” Naïve LinUCB Reshape d Full Space “All Users” Coarse- to-Fine Approach Subspac e Leave-one-out simulation validation Compared against hierarchy-free baselines CoFineUCB combines efficiency of Subspace Learning with flexibility of Full Space Learning Live User Study Showed real users real articles 10 articles/day, 10 days Counted #likes If then suffices to learn primarily in subspace K-dimensional space much more efficient to explore Explore full space as

Upload: howie

Post on 22-Feb-2016

47 views

Category:

Documents


1 download

DESCRIPTION

Hierarchical Exploration for Accelerating Contextual Bandits. Yisong Yue, Sue Ann Hong and Carlos Guestrin . Personalized Recommender Systems. CoFineUCB : Coarse-to-Fine Hierarchical Exploration. Every day, user visits news portal Wish to personalize to her preferences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hierarchical Exploration for Accelerating Contextual Bandits

Hierarchical Exploration forAccelerating Contextual Bandits

Yisong Yue, Sue Ann Hong and Carlos Guestrin

Personalized Recommender Systems• Every day, user visits news portal• Wish to personalize to her preferences

• Can only learn from feedback• E.g., user clicks on or “likes” article

• Leads to exploration vs exploitation dilemma• Goal is to satisfy user• Must make exploratory recommendations to

learn user’s preferences• Formalized as a contextual bandit problem

Linear Stochastic Bandit Problem

Balancing Exploration vs. Exploitation

CoFineUCB: Coarse-to-Fine Hierarchical Exploration

Feature Hierarchies• Suppose “stereotypical users” span K-dimensional space• E.g., “European vs. Asian news”

• Let U = D x K matrix

• Define projection of articles into subspace:

• Define representation of user profile:

• Thus:

News Recommender Simulations & User Study

• Two tiered exploration:• First in subspace • Then in full space

Theorem: with probability 1- δ average bounded by

Comparison Win / Tie / Loss Gain / DayCoFineUCB vs. Naïve 24 / 1 / 3 0.69

CoFineUCB vs. Reshaped 21 / 3 / 6 0.27

Mean Estimate by Topic Uncertainty of Estimate

+

• At each iteration t:• Set of available actions Xt = {xt,1, …, xt,n} (available articles)

• Algorithm chooses action xt from Xt (recommends an article)

• User provides feedback ŷt (user clicks on or “likes” the article)• Algorithm incorporates feedback

• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)

• Regret:

• At each iteration:

• In example below: select article on economy:

UncertaintyEstimated Gain

“Upper Confidence Bound” • Given empirical sample of learned profiles W

• Can also be used to reshape full space (use LearnU(W,D))

Constructing Feature Hierarchies Using Prior Knowledge

“Atypical Users”

Naïve LinUCB

Reshaped Full Space

“All Users”

Coarse-to-Fine Approach

Subspace

• Leave-one-out simulation validation• Compared against hierarchy-free baselines• CoFineUCB combines efficiency of Subspace

Learning with flexibility of Full Space Learning

• Live User Study• Showed real users real articles • 10 articles/day, 10 days• Counted #likes

• If then suffices to learn primarily in subspace

• K-dimensional space much more efficient to explore• Explore full space as needed