online learning for collaborative filtering guang ling, haiqin yang, irwin king, michael lyu...
TRANSCRIPT
Online Learning for Collaborative Filtering
Guang Ling, Haiqin Yang, Irwin King, Michael LyuPresented by Guang LING
1
Outline
2
Introduction PMF and RMF Online PMF and Online RMF Experiments and Results Conclusion and Future Work
Introduction
4
Recommender system emerged Content based filtering
Analyze item content Collaborative filtering
Rating based
Introduction
5
Collaborative filtering Allow user to rate items Infer user’s taste and item’s
feature based on ratings Match user’s preferences with
item’s features
Introduction
6
Various methods have been developed Memory based
User based Item based
Model based PMF, RMF PLSA, PLPA
So, what is the problem?
I1 I2 I3 I4
U1 1 5 4 ?
U2 2 5 4 1
U3 4 2 1 4
Introduction
7
Unrealistic assumptions All ratings are available There will be no new rating Data set are small enough to be handled in main
memory Reality
Ratings are collected over time New ratings are received constantly Huge data set cannot be easily handled
Introduction
8
We propose online CF algorithm that Obviate the need to hold all data Make incremental changes based solely on new
rating Scale linearly with the number of ratings Extra features
Command explicit regularization effect
PMF and RMF
9
Matrix factorization models Factor R into U and V Minimize
Square loss: PMF Cross entropy: RMF
No. users No. items
PMF
10
Conditional distribution over observed ratings:
Spherical Gaussian priors on user and movie feature vectors:
Maximize posterior:
PMF
11
Maximize
Equivalent to minimize the following loss:
Using gradient descent to minimize loss:
Squared loss
Regularization
RMF
12
Top one probability The probability that an item i being ranked
on top
Minimize cross entropy Cross entropy measures the divergence
between two distributions Un-normalized KL-divergence
Online PMF
14
We propose two online algorithms for PMF Stochastic gradient descent
Adjust model stochastically for each observation Regularized dual averaging
Maintain an approximated average gradient Solve an easy optimization problem at each iteration
Stochastic Gradient Descent PMF
15
Recall the loss function for PMF
Squared loss can be dissected and associated with each observation triplet
Update model using gradient of this loss:
Regularized Dual Averaging PMF
16
Maintain the approximated average gradient
Previous gradientGradient due to new observationNumber of items rated by u
Regularized Dual Averaging PMF
17
Solve the following optimization problem to obtain New user feature vector New item feature vector
Online RMF
18
Similar to online PMF, we propose two online algorithms for RMF Stochastic Gradient Descent Regularized Dual Averaging
However, the challenge is Loss function cannot be easily dissected
Online RMF
19
Recall the loss function for RMF
When a new observation is revealed Loss due to new item
Decay of previous items
Online RMF
20
We approximate the gradient by
Decay previous gradient Gradient with respect to new item
Decay previous gradient Gradient with respect to new item
Experiments and Results
22
Online Vs. Batch algorithms Performance under different settings Sensitivity analysis of parameters Scalability to large dataset
Evaluation Metric
23
Root Mean Square Error(RMSE) The lower the better
Normalized Discounted Cumulative Gain(NDCG) The higher the better
Online Vs. Batch algorithms
24
We conduct experiments on real life data set MovieLens: movie rating data set
6,040 users 3,900 movies 1,000,209 ratings 4.25% of user-item rating matrix is known
Simulate three settings T1: 10% training, 90% testing T5: 50% training, 50% testing T9: 90% training, 10% testing
Impact of in PMF
27
denote the regularization parameter Observation
Fewer training data needs more regularization Results are quite sensitive to regularization
SGD-PMF DA-PMF
Impact of in RMF
28
denote the regularization parameter Observation
Fewer training data needs more regularization
SGD-RMF RDA-RMF
Impact of learning rate
29
We use to denote the learning rate It is used in stochastic gradient descent algorithms
only
SGD-PMF SGD-RMF
Scalability to large dataset
30
Yahoo! Music dataset Largest CF dataset publicly available 252,800,275 ratings 1,000,990 users 624,961 items Rating value range [0, 100]
Scalability to large dataset
31
Experiment environment Linux workstation (Xeon Dual Core 2.4 GHz, 32 GB
RAM) Batch PMF: 8 hours for 120 iteration Online PMF: 10 minutes
T1 T5
Conclusion and Future Work
32
We proposed online CF algorithms Perform comparable or even better than
corresponding batch algorithms Scales linearly with number of ratings Adjust model incrementally given new observation
Future Work Theoretical bound for convergence rate Find better approximation for average gradient of
RMF