deep reinforcement learning forpage-wise recommendationszhaoxi35/paper/recsys2018poster.pdf ·...
TRANSCRIPT
70
DeepPage Framework ExperimentsMotivation
Deep Reinforcement Learning for Page-wise RecommendationsXiangyu Zhao1, Long Xia2, Liang Zhang2, Zhuoye Ding2, Dawei Yin2 and Jiliang Tang1
1: Data Science and Engineering Lab, Michigan State University 2: Data Science Lab, JD.com
7
Conclusion and Future Work
Framework Overview
Data Science and Engineering Lab
Recommender systems can mitigate the information overloadproblem by suggesting users’ personalized items
q Interactions between recommender systems and usersv Each time the system recommends a page of items to the user
v Next the user browses these items and provides real-time feedback
v Then the system recommends a new page of items…
q Two key problemsv How to efficiently capture user's dynamic preference and update
recommending strategy according to user's real-time feedback
v How to generate a page of items with proper display based on user's
preferences
We model the recommendation task as a MDP and leverage Reinforcement Learning to automatically learn the optimal recommendation strategies
1. Framework Architecture Selection
action ai
h1
h2
· · ·
(c)(b)(a) Actor Critic
state s
action a
Q(s, ai)Q(s, a2)Q(s, a1) Q(s, a)
action a
h2 h2h2
h1 h1 h1
state s state s state s
ü Actor-Critic Frameworkü Actor: current state -> a deterministic action (recommend a deterministic
page of items)
ü Critic: this state-action pair -> Q-value
✕ (a) cannot handle large and dynamic action space scenario
✕ (b) high computational cost
Q(s, a) = Es′!
r + γQ(s′, a′)|s, a"
Q∗(s, a) = Es′!
r + γmaxa′
Q∗(s′, a′)|s, a"
2. Markov Decision Process
1. Architecture of Actor Framework
1.3. Decoder for Action Generation Process
2. The Architecture of Critic Framework
· · · · · ·
· · · · · ·
eNe2e1
hNh2h1
Embedding Layer
Input Layer
sini
GRU Layer
Output Layer
ENE2E1
zt = σ(WzEt + Uzht−1)rt = σ(WrEt + Urht−1)ht = tanh[WEt + U(rt · ht−1)]ht = (1− zt)ht−1 + ztht
sini = ht
· · · · · ·
· · · · · ·
· · · · · ·
sini
Embedding Layer
Input Layer
GRU Layer
CNN Layer
scur
Attention Layer
h2h1 hT
Page Layer
Output Layer
fiei ci
Xi
XM−1
X2X1
XM· · ·
· · ·
αTα2α1
P1
p1
· · ·· · · · · ·
ei : item representationci : item
′s categoryfi : user
′s feedback
Xi = concat(Ei, Ci, Fi)
= tanh!
concat(WEei + bE,WCci + bC,WFfi + bF )"
pt = conv2d(Pt)
scur
=
T!
t=1
αtht αt =exp(Wαht + bα)!j exp(Wαhj + bα)
1.1. Encoder for Initial State Generation Process
q The Actor is designed to generate a page of recommendations according to
user's preference, which needs to tackle three challenges
v Setting an initial preference at the beginning of a new recommendation session
v Learning the real-time preference in the current session
v Jointly generating a set of recommendations and displaying them in a 2-D page
1.2. Encoder for Real-time State Generation Process
q Setting an initial preference at the beginning of a new session
v Inputs: user’s last clicked/ordered items before the current session
v Output: the representation of users’ initial preference by a vector
q Learning the real-time preference in the current session
v Inputs: item representation, item’s category (to generate complementary
recommendations) and user’s feedback (to capture user’s interests in current page)
v Output: the representation of users’ real-time preference by a vector
q Jointly generating a set of recommendations and displaying them in a page
v Inputs: users’ real-time preference
v Output: the representation of a page of recommendations (items) with proper display
q Critic aims to learn an action-value function Q(s,a), which is a judgment of
whether the action “a” matches the current state “s”v Inputs: users’ real-time preference “s” and a page of recommendations (items) “a”
v Output: Q-value, i.e., Q(s,a)
h1
h2
acur
pro
CNN
Actor Critic
User
eM
e2e1 · · ·
· · ·
eM−1
e2e1
eM
acurval Q(s, a)
ars s
· · · · · ·
· · · eM−1
· · ·
· · · · · · · · ·
· · ·
Decoderacur = deconv2d(scur) a = conv2d(acur)
1. How DeepPage performs compared to representative baselines?
2. How the components in Actor and Critic contribute to the performance?
Ø Dataset from JD.com
Ø Baselines
Ø Collaborative filtering
Ø Factorization Machines
Ø GRU for Rec
Ø DQN
Ø DDPG
training process
Ø 1: no embedding-layers
Ø 2: no category and feedback
Ø 3: no GRU for initial state
Ø 4: no CNNs
Ø 5: no attention layers
Ø 6: no GRU for real-time state
Ø 7: no DeCNN
q We make following observations:
v DeepPage-1 and DeepPage-2 validate that incorporating category/feedback information and the
embedding layer can boost the quality of recommendation
v DeepPage-3 and DeepPage-6 suggest that setting user’s initial preference, and capturing user’s
real-time preference is helpful for accurate recommendations
v DeepPage outperforms DeepPage-4 and DeepPage-7, which indicates that item display strategy
can influence the decision making process of users
q It can be observed:
v CF and FM perform worse than other baselines. These two baselines ignore the temporal
sequence of the users' browsing history
v DQN and DDPG outperform GRU. We design GRU to maximize the immediate reward for
recommendations, while DQN and DDPG are designed to achieve the trade-off between short-
term and long-term rewards
v DeepPage performs better than conventional DDPG. Compared to DDPG, DeepPage jointly
optimizes a page of items and uses GRU to learn user's real-time preference
q Conclusionv We propose a novel Actor-Critic framework, which can be applied in scenarios with
large and dynamic item space and can reduce computational cost significantly
v DeepPage framework is able to capture the real-time preference and optimize a
page of items jointly, which can boost recommendation performance
q Future Work
v We would like to handle multiple tasks such as search, bidding, advertisement and
recommendation collaboratively in one reinforcement learning framework
v We would like to validate with more agent-user interaction patterns, e.g., adding
items into shopping cart
q Acknowledgements
v This research is supported by National Science Foundation
(IIS-1714741, IIS-1715940) and Criteo Faculty Research Award
My homepage
Deconvolution Neural Network
Convolution Neural Network