hierarchical exploration for accelerating contextual bandits
DESCRIPTION
Hierarchical Exploration for Accelerating Contextual Bandits. Yisong Yue, Sue Ann Hong and Carlos Guestrin . Personalized Recommender Systems. CoFineUCB : Coarse-to-Fine Hierarchical Exploration. Every day, user visits news portal Wish to personalize to her preferences - PowerPoint PPT PresentationTRANSCRIPT
Hierarchical Exploration forAccelerating Contextual Bandits
Yisong Yue, Sue Ann Hong and Carlos Guestrin
Personalized Recommender Systems• Every day, user visits news portal• Wish to personalize to her preferences
• Can only learn from feedback• E.g., user clicks on or “likes” article
• Leads to exploration vs exploitation dilemma• Goal is to satisfy user• Must make exploratory recommendations to
learn user’s preferences• Formalized as a contextual bandit problem
Linear Stochastic Bandit Problem
Balancing Exploration vs. Exploitation
CoFineUCB: Coarse-to-Fine Hierarchical Exploration
Feature Hierarchies• Suppose “stereotypical users” span K-dimensional space• E.g., “European vs. Asian news”
• Let U = D x K matrix
• Define projection of articles into subspace:
• Define representation of user profile:
• Thus:
News Recommender Simulations & User Study
• Two tiered exploration:• First in subspace • Then in full space
Theorem: with probability 1- δ average bounded by
Comparison Win / Tie / Loss Gain / DayCoFineUCB vs. Naïve 24 / 1 / 3 0.69
CoFineUCB vs. Reshaped 21 / 3 / 6 0.27
Mean Estimate by Topic Uncertainty of Estimate
+
• At each iteration t:• Set of available actions Xt = {xt,1, …, xt,n} (available articles)
• Algorithm chooses action xt from Xt (recommends an article)
• User provides feedback ŷt (user clicks on or “likes” the article)• Algorithm incorporates feedback
• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)
• Regret:
• At each iteration:
• In example below: select article on economy:
UncertaintyEstimated Gain
“Upper Confidence Bound” • Given empirical sample of learned profiles W
• Can also be used to reshape full space (use LearnU(W,D))
Constructing Feature Hierarchies Using Prior Knowledge
“Atypical Users”
Naïve LinUCB
Reshaped Full Space
“All Users”
Coarse-to-Fine Approach
Subspace
• Leave-one-out simulation validation• Compared against hierarchy-free baselines• CoFineUCB combines efficiency of Subspace
Learning with flexibility of Full Space Learning
• Live User Study• Showed real users real articles • 10 articles/day, 10 days• Counted #likes
• If then suffices to learn primarily in subspace
• K-dimensional space much more efficient to explore• Explore full space as needed