recommender system with distributed representation
TRANSCRIPT
分散表現を用いた商品レコメンダーシステムの構築と評価Recommender System with Distributed Representation
Thuy PhiVan1,2, Chen Liu 2 and Yu Hirate2
1. Computational Linguistics Laboratory, NAIST
2.Rakuten Institute of Technology, Rakuten, Inc.
{ar-thuy.phivan, chen.liu, yu.hirate}@rakuten.com
2
1. Distributed Representation
for words, docs and categories
3
Distributed Representations for Words
• Distributed representations for words
• Similar words are projected into similar vectors.
• Relationship between words can be expressed
as a simple vector calculation.
[T.Mikolov et al. NIPS 2013]
• Analogy
• v(“woman”) – v(”man”) + v(”king”) = v(“queen”)
4
2 models in word2vec
input projection output input projection output
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
CBoW Skip-gram
• given context words
• predict a probability of
a target word
• given a target word
• predict a probability of
context words
5
Sample results of word2vec
trained by Wikipedia data
query: nagoya
• osaka 0.799002
• chiba 0.762829
• fukuoka 0.755166
• sendai 0.731760
• yokohama 0.729205
• kobe 0.726732
• shiga 0.705707
• niigata 0.699777
• aichi 0.692371
• hyogo 0.687128
• saitama 0.685672
• tokyo 0.671428
• sapporo 0.670466
• kumamoto 0.660786
• japan 0.658769
• kitakyushu 0.654265
• wakayama 0.652783
• shizuoka 0.624380
query: coffee
• cocoa 0.603515
• robusta 0.565269
• beans 0.565232
• bananas 0.565207
• cinnamon 0.556771
• citrus 0.547495
• espresso 0.542120
• caff 0.542082
• infusions 0.538069
• tea 0.532565
• cassava 0.524657
• pineapples 0.523557
• coffea 0.512420
• tapioca 0.510727
• sugarcane 0.508203
• yams 0.507347
• avocados 0.507072
• arabica 0.506231
6
Doc2Vec(Paragraph2Vec) [Q.Le et al. ICML2014]
input projection output input projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
PV-DM PV-DBoW
v(t-2)
• Assign a “Document Vector” to each document
• Document vector can be used for
• feature of the document
• similarity of documents
7
Category2Vec [Marui et al. NLP2015]
https://github.com/rakuten-nlp/category2vec
• Assign “Category Vector” to each category.
• Each document has its own category information.
input projection outputinput projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
CV-DM CV-DBoW
v(t-2)
v(cat)
v(cat)
8
2. Applying Doc2Vec to
Item Recommender
9
Recommender Systems in EC service
Item2Item recommender• Given an item, show relevant items to the item
User2Item recommender• Given a user, show relevant items to the user
10
Distributed Representation for Users and Items
Document : a sequence of words with context.
User : a sequence of item views with user’s intention.
Set of documentsVectors for words
Vectors for documents
sim{word, word}
sim{doc, word}
sim{doc, doc}
Set of user behaviorsVectors for items
Vectors for users
sim{item, item}
sim{user, item}
sim{user, user}
11
Dataset Preparation
• Service:
• Rakuten Singapore www.rakuten.com.sg
• Rakuten’s EC service in Singapore
• Started from 2014.
• Data Source
• Purchase History Data
• Click Through Data
• Term
• Jan. 2015 – Oct. 2015
12
Dataset Preparation
(Purchase History Data)
• A set of items purchased by the same user.
User ID A set of Purchased Items
user #1 𝑖𝑡𝑒𝑚1,1, 𝑖𝑡𝑒𝑚1,2
user #2 {𝑖𝑡𝑒𝑚2.1, 𝑖𝑡𝑒𝑚2.2, 𝑖𝑡𝑒𝑚2.3}
⋮ ⋮
user #N {𝑖𝑡𝑒𝑚𝑁.1}
13
Dataset Preparation
(Click Through Data)
• A set of users’ sessions
• Session :
• A sequence of page views with the same cookie.
• A sequence is splitted by time interval > 2 hours.
User ID A set of Sessions
user #1 𝑖𝑡𝑒𝑚1.1.1, 𝑖𝑡𝑒𝑚1.1.2, ⋯ , 𝑖𝑡𝑒𝑚1.1.𝑛 , 𝑖𝑡𝑒𝑚1,2,1 ⋯
user #2 {𝑖𝑡𝑒𝑚2.1.1, 𝑖𝑡𝑒𝑚2.1.2}
⋮ ⋮
user #N 𝑖𝑡𝑒𝑚𝑁.1.1, 𝑖𝑡𝑒𝑚𝑁.1.2, ⋯ , 𝑖𝑡𝑒𝑚𝑁.1.𝑛 , 𝑖𝑡𝑒𝑚𝑁,2,1, ⋯
Longer than 2 hours time
Session A Session B
: session
14
Dataset Property
• More than 60% of sessions finish with one page request.• More than X% of users visited rakuten.com.sg one time only.
Distribution of Session Length Distribution of Session Count
15
Item2Item Recommender (Example)
Click
Though
Data
Purchase
History
Data
16
3. Evaluation
17
Evaluation Metrics
Training Data
2015/0
1/0
1
2015/0
8/3
1
Test
Data
2015/0
9/0
1
2015/1
0/3
1
• N is the total number of common users in training and testing data
• Hit-rate of the recommender system (RS):
hit-rate = Number of hits / N
• Each user: RS predicts top-20 items
• “Hit”: any items for 1 particular user appear in test data
18
Evaluations
1. Parameter Optimization
• Find an optimal parameter set.
• Find important parameters to build a good
model
2. Performance Comparison with Conventional
Recommender Algorithms
• Item Similarity
• Matrix Factorization
19
1. Parameter Optimization
Parameter Values Explanation
Size[50, 100, 200, 300,
400, 500]Dimensionality of the vectors
Window [1, 3, 5, 8, 10, 15]Maximum number items of context
that the training algorithm take into account
Negative [0, 5, 10, 15, 20, 25]Number of “noise words” should be drawn
(usually between 5-20)
Sample[0, 1e-2, 1e-3, 1e-4,
1e-5, 1e-6, 1e-7, 1e-8]Sub-sampling of frequent words
Min-count [1, ..., 20]Items appear less than this min-count
value is ignored
Iteration [10,15, 20, 25, 30] Number of iteration for building model
• Best setting for parameters
Size Window Negative Sample min_count Iteration hit-rate
300 8 10 1e-5 3 20 0.1821
20
1. Parameter Optimization
13.7
15.5
17.7 18.2 17.817.2
0
2
4
6
8
10
12
14
16
18
20
50 100 200 300 400 500
hit
-ra
te(%
)
Size
15.4
16.917.8 18.2 18 18
0
2
4
6
8
10
12
14
16
18
20
1 3 5 8 10 15
hit
-ra
te(%
)
window
15.9
17.9 18.2 17.6 17.4 17.3
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25
hit
-ra
te(%
)
Negative
16.216.516.416.7
18.2
15.1
2
0.3
0
2
4
6
8
10
12
14
16
18
20
0
1.0
0E
-02
1.0
0E
-03
1.0
0E
-04
1.0
0E
-05
1.0
0E
-06
1.0
0E
-07
1.0
0E
-08
hit
-ra
te(%
)
Sample
16
.8 18
.2
18.9
18
.8
18
.9 19
18
.8
18
.7
18
.9
18
.90
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19
hit
-ra
te(%
)
Min_count
16.817.8 18.2 18.2 18.2
0
2
4
6
8
10
12
14
16
18
20
10 15 20 25 30
hit
-ra
te(%
)
Iteration
21
2. Performance Comparison
with Conventional Recommender Algorithms
Item Similarity Matrix Factorization
U x
I= { }
= { }
Jaccard Sim. of user setsdim=32
max iteration=25
22
2. Performance Comparison
with Conventional Algorithms
0
2
4
6
8
10
12
14
16
18
20
Item Similarity MatrixFactorization
Doc2Vec
hit
-rate
(%)
Doc2Vec based algorithm performed the best.
23
Conclusion and Future Works
• Conclusion• Developed distributed representation based RS.
• Applied it to dataset generated based on Rakuten Singapore
click through data.
• Confirmed distributed representation based RS performed better
than conventional RS algorithms.
• Future Works• Distributed representation based RS based on other datasets
• Rakuten Singapore Product Data
• Rakuten (Japan) Ichiba Click Though Data
• Hybrid Model (contents based RS x user behavior based RS)
• Testing the real service.
24
Thank you