online learning for collaborative filtering guang ling, haiqin yang, irwin king, michael lyu...

33
Online Learning for Collaborative Filtering Guang Ling, Haiqin Yang, Irwin King, Michael Lyu Presented by Guang LING 1

Upload: maximilian-simpson

Post on 02-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Online Learning for Collaborative Filtering

Guang Ling, Haiqin Yang, Irwin King, Michael LyuPresented by Guang LING

1

Outline

2

Introduction PMF and RMF Online PMF and Online RMF Experiments and Results Conclusion and Future Work

Introduction

3

We face unprecedentedly large amount of choice! Search Vs. Recommend

Introduction

4

Recommender system emerged Content based filtering

Analyze item content Collaborative filtering

Rating based

Introduction

5

Collaborative filtering Allow user to rate items Infer user’s taste and item’s

feature based on ratings Match user’s preferences with

item’s features

Introduction

6

Various methods have been developed Memory based

User based Item based

Model based PMF, RMF PLSA, PLPA

So, what is the problem?

I1 I2 I3 I4

U1 1 5 4 ?

U2 2 5 4 1

U3 4 2 1 4

Introduction

7

Unrealistic assumptions All ratings are available There will be no new rating Data set are small enough to be handled in main

memory Reality

Ratings are collected over time New ratings are received constantly Huge data set cannot be easily handled

Introduction

8

We propose online CF algorithm that Obviate the need to hold all data Make incremental changes based solely on new

rating Scale linearly with the number of ratings Extra features

Command explicit regularization effect

PMF and RMF

9

Matrix factorization models Factor R into U and V Minimize

Square loss: PMF Cross entropy: RMF

No. users No. items

PMF

10

Conditional distribution over observed ratings:

Spherical Gaussian priors on user and movie feature vectors:

Maximize posterior:

PMF

11

Maximize

Equivalent to minimize the following loss:

Using gradient descent to minimize loss:

Squared loss

Regularization

RMF

12

Top one probability The probability that an item i being ranked

on top

Minimize cross entropy Cross entropy measures the divergence

between two distributions Un-normalized KL-divergence

RMF

13

Model loss is defined as:

Using gradient descent to minimize:

Cross entropy Regularization

Online PMF

14

We propose two online algorithms for PMF Stochastic gradient descent

Adjust model stochastically for each observation Regularized dual averaging

Maintain an approximated average gradient Solve an easy optimization problem at each iteration

Stochastic Gradient Descent PMF

15

Recall the loss function for PMF

Squared loss can be dissected and associated with each observation triplet

Update model using gradient of this loss:

Regularized Dual Averaging PMF

16

Maintain the approximated average gradient

Previous gradientGradient due to new observationNumber of items rated by u

Regularized Dual Averaging PMF

17

Solve the following optimization problem to obtain New user feature vector New item feature vector

Online RMF

18

Similar to online PMF, we propose two online algorithms for RMF Stochastic Gradient Descent Regularized Dual Averaging

However, the challenge is Loss function cannot be easily dissected

Online RMF

19

Recall the loss function for RMF

When a new observation is revealed Loss due to new item

Decay of previous items

Online RMF

20

We approximate the gradient by

Decay previous gradient Gradient with respect to new item

Decay previous gradient Gradient with respect to new item

Online RMF

21

Stochastic Gradient Descent RMF

Dual Averaging RMF

Experiments and Results

22

Online Vs. Batch algorithms Performance under different settings Sensitivity analysis of parameters Scalability to large dataset

Evaluation Metric

23

Root Mean Square Error(RMSE) The lower the better

Normalized Discounted Cumulative Gain(NDCG) The higher the better

Online Vs. Batch algorithms

24

We conduct experiments on real life data set MovieLens: movie rating data set

6,040 users 3,900 movies 1,000,209 ratings 4.25% of user-item rating matrix is known

Simulate three settings T1: 10% training, 90% testing T5: 50% training, 50% testing T9: 90% training, 10% testing

Online Vs. Batch algorithms

25

Shown below is the PMF result

T1 T5 T9

Online Vs. Batch algorithms

26

Shown below is the RMF result

T1 T5 T9

Impact of in PMF

27

denote the regularization parameter Observation

Fewer training data needs more regularization Results are quite sensitive to regularization

SGD-PMF DA-PMF

Impact of in RMF

28

denote the regularization parameter Observation

Fewer training data needs more regularization

SGD-RMF RDA-RMF

Impact of learning rate

29

We use to denote the learning rate It is used in stochastic gradient descent algorithms

only

SGD-PMF SGD-RMF

Scalability to large dataset

30

Yahoo! Music dataset Largest CF dataset publicly available 252,800,275 ratings 1,000,990 users 624,961 items Rating value range [0, 100]

Scalability to large dataset

31

Experiment environment Linux workstation (Xeon Dual Core 2.4 GHz, 32 GB

RAM) Batch PMF: 8 hours for 120 iteration Online PMF: 10 minutes

T1 T5

Conclusion and Future Work

32

We proposed online CF algorithms Perform comparable or even better than

corresponding batch algorithms Scales linearly with number of ratings Adjust model incrementally given new observation

Future Work Theoretical bound for convergence rate Find better approximation for average gradient of

RMF

Thanks!

33

Questions?