recommender system with distributed representation

分散表現を用いた商品レコメンダーシステムの構築と評価Recommender System with Distributed Representation

Thuy PhiVan1,2, Chen Liu 2 and Yu Hirate2

1. Computational Linguistics Laboratory, NAIST

2.Rakuten Institute of Technology, Rakuten, Inc.

{ar-thuy.phivan, chen.liu, yu.hirate}@rakuten.com

2

1. Distributed Representation

for words, docs and categories

3

Distributed Representations for Words

• Distributed representations for words

• Similar words are projected into similar vectors.

• Relationship between words can be expressed

as a simple vector calculation.

[T.Mikolov et al. NIPS 2013]

• Analogy

• v(“woman”) – v(”man”) + v(”king”) = v(“queen”)

4

2 models in word2vec

input projection output input projection output

v(t-2)

v(t-1)

v(t+1)

v(t+2)

v(t)

v(t-2)

v(t-1)

v(t+1)

v(t+2)

v(t)

CBoW Skip-gram

• given context words

• predict a probability of

a target word

• given a target word

• predict a probability of

context words

5

Sample results of word2vec

trained by Wikipedia data

query: nagoya

• osaka 0.799002

• chiba 0.762829

• fukuoka 0.755166

• sendai 0.731760

• yokohama 0.729205

• kobe 0.726732

• shiga 0.705707

• niigata 0.699777

• aichi 0.692371

• hyogo 0.687128

• saitama 0.685672

• tokyo 0.671428

• sapporo 0.670466

• kumamoto 0.660786

• japan 0.658769

• kitakyushu 0.654265

• wakayama 0.652783

• shizuoka 0.624380

query: coffee

• cocoa 0.603515

• robusta 0.565269

• beans 0.565232

• bananas 0.565207

• cinnamon 0.556771

• citrus 0.547495

• espresso 0.542120

• caff 0.542082

• infusions 0.538069

• tea 0.532565

• cassava 0.524657

• pineapples 0.523557

• coffea 0.512420

• tapioca 0.510727

• sugarcane 0.508203

• yams 0.507347

• avocados 0.507072

• arabica 0.506231

6

Doc2Vec(Paragraph2Vec) [Q.Le et al. ICML2014]

input projection output input projection output

v(doc)

v(t-1)

v(t+1)

v(t)

v(t-2)

v(t-1)

v(t)

v(t+1)

v(doc)

PV-DM PV-DBoW

v(t-2)

• Assign a “Document Vector” to each document

• Document vector can be used for

• feature of the document

• similarity of documents

7

Category2Vec [Marui et al. NLP2015]

https://github.com/rakuten-nlp/category2vec

• Assign “Category Vector” to each category.

• Each document has its own category information.

input projection outputinput projection output

v(doc)

v(t-1)

v(t+1)

v(t)

v(t-2)

v(t-1)

v(t)

v(t+1)

v(doc)

CV-DM CV-DBoW

v(t-2)

v(cat)

v(cat)

8

2. Applying Doc2Vec to

Item Recommender

9

Recommender Systems in EC service

Item2Item recommender• Given an item, show relevant items to the item

User2Item recommender• Given a user, show relevant items to the user

10

Distributed Representation for Users and Items

Document : a sequence of words with context.

User : a sequence of item views with user’s intention.

Set of documentsVectors for words

Vectors for documents

sim{word, word}

sim{doc, word}

sim{doc, doc}

Set of user behaviorsVectors for items

Vectors for users

sim{item, item}

sim{user, item}

sim{user, user}

11

Dataset Preparation

• Service:

• Rakuten Singapore www.rakuten.com.sg

• Rakuten’s EC service in Singapore

• Started from 2014.

• Data Source

• Purchase History Data

• Click Through Data

• Term

• Jan. 2015 – Oct. 2015

http://www.rakuten.com.sg/

12

Dataset Preparation

(Purchase History Data)

• A set of items purchased by the same user.

User ID A set of Purchased Items

user #1 𝑖𝑡𝑒𝑚1,1, 𝑖𝑡𝑒𝑚1,2

user #2 {𝑖𝑡𝑒𝑚2.1, 𝑖𝑡𝑒𝑚2.2, 𝑖𝑡𝑒𝑚2.3}

⋮ ⋮

user #N {𝑖𝑡𝑒𝑚𝑁.1}

13

Dataset Preparation

(Click Through Data)

• A set of users’ sessions

• Session :

• A sequence of page views with the same cookie.

• A sequence is splitted by time interval > 2 hours.

User ID A set of Sessions

user #1 𝑖𝑡𝑒𝑚1.1.1, 𝑖𝑡𝑒𝑚1.1.2, ⋯ , 𝑖𝑡𝑒𝑚1.1.𝑛 , 𝑖𝑡𝑒𝑚1,2,1 ⋯

user #2 {𝑖𝑡𝑒𝑚2.1.1, 𝑖𝑡𝑒𝑚2.1.2}

⋮ ⋮

user #N 𝑖𝑡𝑒𝑚𝑁.1.1, 𝑖𝑡𝑒𝑚𝑁.1.2, ⋯ , 𝑖𝑡𝑒𝑚𝑁.1.𝑛 , 𝑖𝑡𝑒𝑚𝑁,2,1, ⋯

Longer than 2 hours time

Session A Session B

: session

14

Dataset Property

• More than 60% of sessions finish with one page request.• More than X% of users visited rakuten.com.sg one time only.

Distribution of Session Length Distribution of Session Count

15

Item2Item Recommender (Example)

Click

Though

Data

Purchase

History

Data

16

3. Evaluation

17

Evaluation Metrics

Training Data

2015/0

1/0

1

2015/0

8/3

1

Test

Data

2015/0

9/0

1

2015/1

0/3

1

• N is the total number of common users in training and testing data

• Hit-rate of the recommender system (RS):

hit-rate = Number of hits / N

• Each user: RS predicts top-20 items

• “Hit”: any items for 1 particular user appear in test data

18

Evaluations

1. Parameter Optimization

• Find an optimal parameter set.

• Find important parameters to build a good

model

2. Performance Comparison with Conventional

Recommender Algorithms

• Item Similarity

• Matrix Factorization

19


Parameter Values Explanation

Size[50, 100, 200, 300,

400, 500]Dimensionality of the vectors

Window [1, 3, 5, 8, 10, 15]Maximum number items of context

that the training algorithm take into account

Negative [0, 5, 10, 15, 20, 25]Number of “noise words” should be drawn

(usually between 5-20)

Sample[0, 1e-2, 1e-3, 1e-4,

1e-5, 1e-6, 1e-7, 1e-8]Sub-sampling of frequent words

Min-count [1, ..., 20]Items appear less than this min-count

value is ignored

Iteration [10,15, 20, 25, 30] Number of iteration for building model

• Best setting for parameters

Size Window Negative Sample min_count Iteration hit-rate

300 8 10 1e-5 3 20 0.1821

20


13.7

15.5

17.7 18.2 17.817.2

0

2

4

6

8

10

12

14

16

18

20

50 100 200 300 400 500

hit

-ra

te(%

)

Size

15.4

16.917.8 18.2 18 18

0

2

4

6

8

10

12

14

16

18

20

1 3 5 8 10 15

hit

-ra

te(%

)

window

15.9

17.9 18.2 17.6 17.4 17.3

0

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20 25

hit

-ra

te(%

)

Negative

16.216.516.416.7

18.2

15.1

2

0.3

0

2

4

6

8

10

12

14

16

18

20

0

1.0

0E

-02

1.0

0E

-03

1.0

0E

-04

1.0

0E

-05

1.0

0E

-06

1.0

0E

-07

1.0

0E

-08

hit

-ra

te(%

)

Sample

16

.8 18

.2

18.9

18

.8

18

.9 19

18

.8

18

.7

18

.9

18

.90

2

4

6

8

10

12

14

16

18

20

1 3 5 7 9 11 13 15 17 19

hit

-ra

te(%

)

Min_count

16.817.8 18.2 18.2 18.2

0

2

4

6

8

10

12

14

16

18

20

10 15 20 25 30

hit

-ra

te(%

)

Iteration

21

2. Performance Comparison

with Conventional Recommender Algorithms

Item Similarity Matrix Factorization

U x

I= { }

= { }

Jaccard Sim. of user setsdim=32

max iteration=25

22

2. Performance Comparison

with Conventional Algorithms

0

2

4

6

8

10

12

14

16

18

20

Item Similarity MatrixFactorization

Doc2Vec

hit

-rate

(%)

Doc2Vec based algorithm performed the best.

23

Conclusion and Future Works

• Conclusion• Developed distributed representation based RS.

• Applied it to dataset generated based on Rakuten Singapore

click through data.

• Confirmed distributed representation based RS performed better

than conventional RS algorithms.

• Future Works• Distributed representation based RS based on other datasets

• Rakuten Singapore Product Data

• Rakuten (Japan) Ichiba Click Though Data

• Hybrid Model (contents based RS x user behavior based RS)

• Testing the real service.

24

Thank you