hierarchical exploration for accelerating contextual bandits

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue, Sue Ann Hong and Carlos Guestrin Personalized Recommender Systems • Every day, user visits news portal • Wish to personalize to her preferences • Can only learn from feedback • E.g., user clicks on or “likes” article • Leads to exploration vs exploitation dilemma • Goal is to satisfy user • Must make exploratory recommendations to learn Linear Stochastic Bandit Problem Balancing Exploration vs. Exploitation CoFineUCB: Coarse-to-Fine Hierarchical Exploration Feature Hierarchies • Suppose “stereotypical users” span K- dimensional space • E.g., “European vs. Asian news” • Let U = D x K matrix • Define projection of articles into subspace: • Define representation of user profile: • Thus: News Recommender Simulations & User Study • Two tiered exploration: • First in subspace • Then in full space Theorem: with probability 1- δ average bounded by Comparison Win / Tie / Loss Gain / Day CoFineUCB vs. Naïve 24 / 1 / 3 0.69 CoFineUCB vs. Reshaped 21 / 3 / 6 0.27 Mean Estimate by Topic Uncertainty of Estimate + • At each iteration t: • Set of available actions X t = {x t,1 , …, x t,n } (available articles) • Algorithm chooses action x t from X t (recommends an article) • User provides feedback ŷ t (user clicks on or “likes” the article) • Algorithm incorporates feedback • Assumptions: E[ŷ t ] = w *T x t (w * is unknown to system) • Regret: • At each iteration: • In example below: select article on economy: Uncertainty Estimated Gain “Upper Confidence Bound” • Given empirical sample of learned profiles W • Can also be used to reshape full space (use LearnU(W,D)) Constructing Feature Hierarchies Using Prior Knowledge “Atypical Users” Naïve LinUCB Reshape d Full Space “All Users” Coarse- to-Fine Approach Subspac e • Leave-one-out simulation validation • Compared against hierarchy-free baselines • CoFineUCB combines efficiency of Subspace Learning with flexibility of Full Space Learning • Live User Study • Showed real users real articles • 10 articles/day, 10 days • Counted #likes • If then suffices to learn primarily in subspace • K-dimensional space much more efficient to explore • Explore full space as

Upload: howie

Post on 22-Feb-2016

47 views

Category:

Documents

1 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Hierarchical Exploration for Accelerating Contextual Bandits. Yisong Yue, Sue Ann Hong and Carlos Guestrin . Personalized Recommender Systems. CoFineUCB : Coarse-to-Fine Hierarchical Exploration. Every day, user visits news portal Wish to personalize to her preferences - PowerPoint PPT Presentation

TRANSCRIPT

Hierarchical Exploration forAccelerating Contextual Bandits

Yisong Yue, Sue Ann Hong and Carlos Guestrin

Personalized Recommender Systems• Every day, user visits news portal• Wish to personalize to her preferences

• Can only learn from feedback• E.g., user clicks on or “likes” article

• Leads to exploration vs exploitation dilemma• Goal is to satisfy user• Must make exploratory recommendations to

learn user’s preferences• Formalized as a contextual bandit problem

Linear Stochastic Bandit Problem

Balancing Exploration vs. Exploitation

CoFineUCB: Coarse-to-Fine Hierarchical Exploration

Feature Hierarchies• Suppose “stereotypical users” span K-dimensional space• E.g., “European vs. Asian news”

• Let U = D x K matrix

• Define projection of articles into subspace:

• Define representation of user profile:

• Thus:

News Recommender Simulations & User Study

• Two tiered exploration:• First in subspace • Then in full space

Theorem: with probability 1- δ average bounded by

Comparison Win / Tie / Loss Gain / DayCoFineUCB vs. Naïve 24 / 1 / 3 0.69

CoFineUCB vs. Reshaped 21 / 3 / 6 0.27

Mean Estimate by Topic Uncertainty of Estimate

• At each iteration t:• Set of available actions Xt = {xt,1, …, xt,n} (available articles)

• Algorithm chooses action xt from Xt (recommends an article)

• User provides feedback ŷt (user clicks on or “likes” the article)• Algorithm incorporates feedback

• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)

• Regret:

• At each iteration:

• In example below: select article on economy:

UncertaintyEstimated Gain

“Upper Confidence Bound” • Given empirical sample of learned profiles W

• Can also be used to reshape full space (use LearnU(W,D))

Constructing Feature Hierarchies Using Prior Knowledge

“Atypical Users”

Naïve LinUCB

Reshaped Full Space

“All Users”

Coarse-to-Fine Approach

Subspace

• Leave-one-out simulation validation• Compared against hierarchy-free baselines• CoFineUCB combines efficiency of Subspace

Learning with flexibility of Full Space Learning

• Live User Study• Showed real users real articles • 10 articles/day, 10 days• Counted #likes

• If then suffices to learn primarily in subspace

• K-dimensional space much more efficient to explore• Explore full space as needed

Introduction to Reinforcement Learning Part 3: Exploration ...researchers.lille.inria.fr/~munos/papers/files/part3.pdf · Introduction to bandits Games Hierarchical bandits Lipschitz

Personalized Recommendation via Parameter-Free Contextual Bandits

Adversarial Linear Contextual Bandits with Graph

Learning for Contextual Banditsexploration_learning/main.pdf · 2010-09-23 · Learning for Contextual Bandits Alina Beygelzimer 1 John Langford 2 IBM Research1 Yahoo! Research2 NYC

Ensemble Contextual Bandits for Personalized Recommendation

Charging control of electric vehicles using contextual ...€¦ · Charging control of electric vehicles using contextual bandits considering the electrical distribution grid Christian

Top-K Contextual Bandits with Equity of Exposure

Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Learning Contextual Bandits in a Non-stationary Environment · Qingyun Wu, Naveen Iyer, Hongning Wang Department of Computer Science, University of Virginia Charlottesville, VA, USA

Contextual Bandits€¦ · Sikander Randhawa Machine Learning Reading Group July 17, 2019. Recap: Experts problem •K experts; sequence of rewards 1, 2,…with ∈0,1𝐾. •Each

Contextual Combinatorial Cascading Bandits of contextual combinatorial cascading bandits with general reward functions and position discounts; Sec-tion 4 gives our algorithm and main

Taming the monster: A fast and simple algorithm for contextual bandits PRESENTED BY Satyen Kale Joint work with Alekh Agarwal, Daniel Hsu, John Langford,

1 Stochastic Contextual Bandits with Known Reward Functionsanrg.usc.edu/www/papers/DCB_ANRG_TechReport.pdf · 5 yields a regret that grows logarithmically in time and linearly in

Hierarchical Linear Modeling to Explore the Influence of ... · 256 Hierarchical Linear Modeling on Housing Prices the manner in which contextual variables influence or adjust the

The exploration-exploitation trade-offexploitation trade-o Pantelis Pipergias Analytis Exploration-exploitation problems The multi-armed bandit framework Strategies Contextual bandits

arXiv:0907.3986v5 [cs.DS] 20 May 2014arXiv:0907.3986v5 [cs.DS] 20 May 2014 Contextual Bandits with Similarity Information∗ Aleksandrs Slivkins † First version: February 2009 This

Contextual Bandits with Linear Payo Functionsproceedings.mlr.press/v15/chu11a/chu11a.pdf · Contextual Bandits with Linear Payo Functions ... we give a theoretical analysis of a vari-

Power-Constrained Bandits - [email protected] Abstract Contextual bandits often provide simple and effective personalization in decision making problems,

Distributed Online Learning via Cooperative …1 Distributed Online Learning via Cooperative Contextual Bandits Cem Tekin*, Member, IEEE, Mihaela van der Schaar, Fellow, IEEE Electrical

Hierarchical Exploration for Accelerating Contextual Bandits

Bandits Repentis

Contextual Bandits with Similarity Informationproceedings.mlr.press/v19/slivkins11a/slivkins11a.pdf · Contextual Bandits with Similarity Information Aleksandrs Slivkins [email protected]

Abstract - arxiv.org · Simple Regret Minimization for Contextual Bandits Aniket Anand Deshmukh* 1, Srinagesh Sharma* James W. Cutler2 Mark Moldwin3 Clayton Scott1 1

BanditRank: Learning to Rank Using Contextual Bandits

Dueling Bandits

The Epoch-Greedy Algorithm for Contextual Multi …cseweb.ucsd.edu/~kamalika/teaching/CSE291W11/feb28.pdfThe Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford

Learning Contextual Hierarchical Structure of Medical ......Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes BrettK.Beaulieu-Jones,IsaacS.KohaneandAndrewL.Beamy

Contextual Multi-Armed Banditsproceedings.mlr.press/v9/lu10a/lu10a.pdfContextual Multi-Armed Bandits ... tion of the classical multi-armed bandit problem by Lai and Robbins and the

A Practical Method for Solving Contextual Bandit Problems Using …ae2516/Papers/DecisionTreeBandits.pdf · 2017. 6. 14. · 2 Solving Contextual Bandits Using Decision Trees The

Learning in Generalized Linear Contextual Bandits …...regret bound as in Joulani et al. (2013) when feedback is not only delayed but also anonymous. For adversarial multi-armed bandits

Bayesian Contextual Multi-armed Bandits Contextual Multi-armed Bandits ... The Epoch-Greedy Algorithm for Contextual Multi-armed ... topic model w/ a Bayesian multi-armed bandit analysis

強化学習勉強会・論文紹介（第30回）Ensemble Contextual Bandits for Personalized Recommendation

BREADCRUMBS & BANDITS

Neural Contextual Bandits with UCB-based ExplorationarXiv:1911.04462v3 [cs.LG] 2 Jul 2020 Neural Contextual Bandits with UCB-based Exploration Our main contributions are as follows:

Time Bandits

hierarchical exploration for accelerating contextual bandits

Documents

feedback t user

action xt

space theorem

user visits news portalwish

representation of user

available actions xt

users preferencesformalized

tiered exploration