learning hierarchical representation model for next basket
TRANSCRIPT
Learning Hierarchical Representation Model for Next
Basket RecommendationPengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan,
Xueqi Cheng
CAS Key Lab of Network Data Science and Technology
Institute of Computing Technology,Chinese Academy of Sciences
Outline
• Task
• Background
• Motivation
• Our modelstructure of HRM
connections to previous methods
• Experiments
• Summary
2
Problem: Next basket recommendation
3
Next basket recommendation : given a sequence of purchases, what items are purchased sequentially?
potato
beers
bread
phone
maternity
butter
bread
earphone
milk
baby-car
nursing bottle
battery
transaction i-1 transaction i transaction i+1
?
4
Background
• Sequence models
• Collaborative Filtering
• Hybrid methods
MC (Markov Chain) [Zimdars et al. UAI01, Chen et al. KDD12]
NMF(Non negative Matrix Factorization) [Daniel D. Lee et al. NIPS01]
FPMC (Factorized Personalized Markov Chain) [S Rendle et al. WWW10]
5
a a
d
b
transactions of u1
b
c
a
transactions of u2
Weakness of Previous methods
weakness of MC: Lack users` general interests
Sequence models based on Markov Assumption:MC
a,b,c, d represent items
6
a a
d
b
transactions of u1
b
c
a
transactions of u2
Weakness of Previous methods
Global one:
weakness of NMF: Lack sequence behaviors
Collaborative filtering matrix factorization:NMF
Global set
7
a a
d
b
transactions of u1
b
c
a
transactions of u2
Weakness of Previous methods
Hybrid method:Factorized Personalized Markov Chain [S Rendle et al. WWW10 best paper]
weakness of FPMC: linear combination of different factors
vi
vj
u
v
vj
vi
U represent users, V represent items
8
a a
d
b
transactions of u1
b
c
a
transactions of u2
Weakness of Previous methods
Hybrid method:Factorized Personalized Markov Chain [S Rendle et al. WWW10 best paper]
weakness of FPMC: linear combination of different factors
General taste Sequential behavior
user next item last itemnext item
+
<viI,L,v1
L,I>
<viI,L,v2
L,I>
<viI,L,v3
L,I>
linearcombinationIndependent Influence!
9
Weakness of Previous methods
+
Is that linear combination enough for a good recommendation?
pumpkin potato
pumpkin
cucumber
chipschocolate
candy
candy
next transaction
next transaction
Last transaction
Last transaction
Halloween!
We need a model that is capable of incorporating more complicated interactions among multiple factors. This becomes the major
motivation of our work.
Motivation
10
11
Hierarchical Representation Model
The structure of HRM
Level 2
Level 1
general interest
sequential behavior
Aggregation Method
Linear method
average pooling
Nonlinear method
max pooling
other types of operators(top-k average pooling, k-max pooling, hadamard pooling)
(V is a set of input vectors to be aggregated)
13
Learning and Prediction
Objective function
Negative sampling
negative count sample distribution
all users all trans all items
The probability of purchasing one item next transaction:
sum of all items: too large!
exponent of item and users` hybrid interest
14
Connection to Previous methods
Degradation To MC
=
Select copy: copy item when constructing the transaction representation from item vectors, the operation randomly selects one item vector and copies it
softmax
select copy
select copy
Connection to Previous methods
15
Degradation To MF
=
Select copy : always select and copy user vector in the second layer, ignoring the sequential information
select copy
softmax
16
Connection to Previous methods
Degradation To FPMC
Avg Pooling is used , each instance corresponds to 1 negative sample
softmax
avg pooling
avg pooling
17
Experiments
Ta-Feng BeiRen T-Mall
#transactions 67964 91294 1805
#items 7982 5845 191
#users 9238 9321 292
#avg.transactionsize
5.9 5.8 1.2
#avg.transactionper user
7.4 9.7 5.6
Data setsretails ecommerce
F1-score:
Hit-ratio:
NDCG:
18
Experiments
Evaluation Metric
a ranking measure
harmonic mean of precision and recall
coverage
19
Experiments
Comparison among Different HRMs
Avg pooling perform worstWhen apply max pooling on any layer, the performance improved a littleWhen apply max pooling on all layers, HRM performed best
observation
20
Experiments
Comparison with baselinestop popular sequential behavior hybrid methodgeneral interest
Top method performed worstNMF and MC performed better than top methodFPMC performed better than NMF and MCHRM performed best
observation
21
Experiments
Comparison over groups
NMF perform better than MC on active group, while MC performs better than NMF on inactive groupHRM performed best
observation
22
Experiments
The Impact of Negative Sampling
More negative count we choose ,the more F1-score we obtainThe sampling number k increases, the performance gain between two consecutive trials decreases
observation
23
Summary
A next basket recommendation task
A Hierarchical Representation Model• model both sequential behavior and users` general taste
• Aggregation operators to connect two level factors.
• HRM can produce multiple recommendation models by introducing different aggregation operations
Furture works• More aggregations operations will be analyzed• Integrate other types of information, e.g. timestamp of
transaction
• Poin twise mutual information (PMI) is a widely used word similarity measure
27
Weakness of Previous methods
the sum of score in general recommend and score of sequential recommend