[ieee communication technologies, research, innovation, and vision for the future (rivf) - ho chi...

A modified Regularized Non-Negative Matrix Factorization for MovieLens

Huy Nguyen and Tien Dinh Faculty of Information Technology, University of Science, VNU-HCM

227 Nguyen Van Cu, District 5, Ho Chi Minh City, Viet Nam Email: [email protected], [email protected]

Abstract— This paper studies the matrix factorization technique for recommendation systems. The problem is to modify and apply non-negative matrix factorization to predict a rating that a user is likely to rate for an item in MovieLens1 dataset. First, based on the original randomize non-negative matrix factorization, we propose a new algorithm that discovers the features underlying the interactions between users and items. Then, in the experimentation section, we provide the numerical results of our proposed algorithms performed on the well-known MovieLens dataset. Besides, we suggest the optimization parameters which should be applied for Matrix Factorization to get good results on MovieLens. Comparison with other recent techniques in the literature shows that our algorithm is not only able to get high quality solutions but it also works well in the sparse rating domains.

Keywords Recommendation Systems, Matrix Factorization, MovieLens

I. INTRODUCTION

Recommendation systems have been made popular in commercial applications, such as Amazon and other video-on-demand companies. They allow users to search for an item efficiently and accurately only by a few operations. Thanks to the success of Amazon as a typical application of the recommendation systems in the real world, many e-commerce websites follow to apply different recommendation algorithms to increase their sales. For example, Marc Serra who is president and owner of Acuista2 web site reported that their overall sale has increased over 20%. Meir Tsinman, the president of TheMedicalSupplyDepot3 web site, said their sales increased more than 20% within weeks since applying the product recommendations. Until now, many other e-commerce web sites have used product recommendation system successfully such as Ebay, Netflix, Overstock, and Border [21]...

The recommendation approaches can be categorized into different groups, such as demographic, utility-based, knowledge-based, collaborative, content-based and hybrid [7]. Each of these approaches has its own strengths and

1 http://www.grouplens.org 2 http://www.acusta.com 3 http://www.medicalsupplydepot.com

weaknesses. Among these, the Collaborating Filtering is one of the most successful approaches over the past decade [6]. However, it still has some drawbacks such as it does not work accurately when the ratings are too fragmented and sparse. Not only that, when the data are very large (e.g. Netflix) the system requires much longer time to process due to the rapid increment of computations. In order to avoid these drawbacks, many researchers have combined two or more approaches together to build up a hybrid methods.

In Burke’s survey, there are seven different ways to combine two approaches together to make a hybrid one, namely Weighted, Switching, Mixed, Feature combination, Feature Augmentation, Cascade, and Meta-level [18, 19]. Thus, with five different approaches, Collaborative, Content-based, Demographic, Knowledge-based, Utility-based, the survey showed that there are 53 possible two-approach hybrid combinations [19]. However, those new hybrid could only solve some drawbacks but not all.

In October 2006, when NetFlix offers the prize of 1 million dollars for anyone who could find a solution which is 10% better than their current advisory systems [1], the multitude of different approaches have been studied. Among them, the method of applying matrix factorization showed a significant improvement on the accuracy of predictions produced by the recommendation systems. Moreover, it also overcomes the two major weaknesses: sparseness and scalability which are normally found in the previously studied methods [4, 7].

However, because of the Netflix’s Prize, most researchers only focused on processing and experimenting with the data of Netflix. They combined many different methods with more complex formulas for the hope of achieving better results. For example, the first progress prize was won by a group of three people with more than 100 predictors, the second one was won by another six persons and more than 200 predictors and the Grand prize was won by “Bell-Kor’s Pragmatic Chaos” with seven members with more than 600 predictors [9].

Although there are quite many well-known approaches which have been applied and tailored to deal with the Netflix dataset, there have been very few algorithms for other data sets such as MovieLens, Jester, and Book-

978-1-4673-0309-5/12/$31.00 ©2012 IEEE

crossing[4]... Thus, it motivates us to study and modify the original version of matrix factorization to apply into the MovieLens dataset. We also compare our experimental results with other best known solutions so far for MovieLen.

Problem definition The recommendation problem is usually defined as

follows: • Given a set of users U = {1,...,N} with N is the

number of users • Given a set of items I ={1,…,M} with M is the

number of items • Given a matrix R [N×M] with rating Ri,j =

{1,2,3,4,5} • Given an assumption that each user in the set U has

scored a number of items in the set I.

The purpose of this problem is to predict the points scored by users to the items. Therefore, we can recommend the good items for them to choose. All information about the scores that the user has given to the items can be represented in a matrix R [N× M]. With R[i, j] cells which has the values greater than 0 are the scores for the items that the users rated accordingly, while the empty entry R[i, j] represents the places in which users have not rated and the system needs to give rating prediction. Example: Consider problem with 4 users and 4 items where the R matrix is as follows:

Items

I1 I2 I3 I4

Users

U1 5 4 3

U2 4 5 5 3

U3 3 4 U4 5 3 3 4

The blank cells are the ones which need to be predicted by the recommendation system. Besides, the scores for the blanks should be consistent with other scores already existed in the system. That means, user U1 could give the item I4 4 points because user U1scored items I1, I2, I3 more similarly to user U4 than the other users U2, U3. On the other hand, user U1 could score item I4 5 points similar to I1 if we considered the similarity among the items. After applying the Non-negative matrix factorization method, figure 1 illustrates the predicted ratings in the gray cells.

Items I1 I2 I3 I4

Users

U1 4.8 3.8 3.4 3.7

U2 4.1 4.9 4.8 3.1

U3 3.1 3.9 3.9 2.4 U4 5.0 3.3 2.9 3.8

Figure 1: Illustration of the predicted ratings by using Non-negative matrix factorization approach.

II. MATRIX FACTORIZATION A. Basic Matrix Factorization

The purpose of the matrix factorization is to split a given matrix into two different matrices which are much smaller in size. The multiplication of the two sub-matrices is approximately equal to the original or the parent matrix. We set R as the output matrix which will store all the rating scores of the users for items; Matrix R will have the size of |U|×|I|.

Assuming that we have K factors, we need to find two matrices P and Q in which, matrix P has the size of |U|×K and matrix Q with the size of |I|×K, in order for R ≈ PQT. In this case, we define an approximate method as follows:

• ij = ∑ , , = PiQTj

• eij = Rij - ij (∀ Rij > 0)

• e = ∑ , ∈ ij

• (P, QT) = ,

Note that, ij is a predicted rating of user i for item j. The error eij is the gradient difference between the predicted rating and the real one. The average error e is the average of all rating-prediction pairs in R. P and QT are the two best matrices whose product will be approximately equal to R.

Recently, matrix factorization in Collaborative Filtering method has been applied to the sparseness and scalability problems [4, 9, 10, 20].

B. Regularized Non-negative Matrix Factorization (RNMF)

The first step in matrix factorization is to initialize P and QT matrixes. Then, we assign some values to all values of them. The most important thing in this step is all of the initial values must be small enough to avoid the risk of missing local minimum gradient when we calculate the distance between the product of them and the real rating matrix R.

The next step is to update the values of P and QT matrixes to find the minimum distance to R. In order to archive that, we not only just increase or decrease the values of P and QT, but we must also find out the direction for updating. First, we calculate the current error, the distance between the product of P, QT and R as:

• = -2(Rij - ij)(QTkj) = -2eij.QT

kj

• = -2(Rij - ij)(Pik) = -2eij. Pik

Base on this current error, we propose a rule of updating all values of P and QT to approach R as follows:

• P’ik = Pik + = Pik + eij.QTkj

• QT’kj = QTkj + = QT

kj + eij. Pik

We call this method RNMF which stands for Regularized Non-negative Matrix Factorization. In this approach, we add a parameter named β to ensure that the result after we calculate the product of P and QT is not too big. This will lead the distance after being updated to be minimal. We adjust the updating rule as follows:

• P’ik = Pik + = Pik + α.(eij.QTkj - β.Pik)

• QT’kj = QTkj + = QT

kj + α.(eij. Pik - β.QTkj)

The process will stop if the average error e is minimum. Thus, after updating P and QT base on above rule, we need to calculate it in order to check whether it comes to the minimum or not. We calculate it as follows:

• e = ∑ , ∈ R =

∑ R R , ∈ R

Algorithm: Training algorithm for Regularized Non-negative Matrix Factorization

1: Input ← R, P, QT, I, K, α, β 2: Randomly generate an initial value from 0 to 1 for all

values in P, QT 3: for step ← 1 to I do 4: for i ← 1 to len(R) do 5: for j ← 1 to len(R[i]) do 6: /* compute the gradient eij between Rij and

product of correlative column and row of the P and QT matrixes */ if R[i, j] > 0 then

7: eij ← R[i, j] - ∑ , , 8: /* update the corresponding values of P,QT

matrixes base on the eij */ for k ← 1 to K do

9: P[i,k]← P[i,k] + α*(eij*QT[k,j] -β*P[i,k]);

10: QT[k,j]← QT[k,j] + α*(eij* P[i,k] - β* QT[k,j]);

11: /* calculate the average error e */ for all 0 < r ∈ R do

12: e ← e + eij 13: e ← e / rated Items

14: // The error e converges to its minimum if e < 0.01 then break

15: Return P, QT

III. EXPERIMENTAL EVALUATION A. Dataset

We used the MovieLens1 data set to evaluate our algorithm. There are three MovieLens data sets:

• 100,000 ratings for 1682 movies by 943 users • 1 million ratings for 3900 movies by 6040 users • 10 million ratings and 100,000 tags for 10681

movies by 71567 users

We used the first data set that composed of 943 users and 1682 items (1-5 scales). In order to compare with other algorithms, we also extracted a subset which is exactly the same as [6, 8, 12]. It contained the first 500 users with each user has more than 40 ratings. The first 100, 200 and 300 users in the data set were selected into three different training user sets respectively, which are addressed as ML100, ML200 and ML300.

In order to compare with other approaches [6, 8, 12], we picked up 5, 10 and 20 ratings of each user in 200 testing users and named them Given5, Given10 and Given20 respectively. We added each of those into ML100, ML200, ML300 to create 9 training datasets in total. Then, the system will be tested with 200 testing users without Given5, Given10 and Given20.

B. Evaluation Metrics

In order to evaluate the quality of a recommendation system, people often use the two approaches: statistical accuracy metrics and Decision support accuracy metrics [2]. In the 1st approach, the accuracy of a system can be evaluated by comparing the numerical recommendation scores against the actual user ratings for the user-item pairs in the test dataset. Mean Absolute Error (MAE) between ratings and predictions is a well known metric.

MAE is used to measure the deviation of the recommendations with that of the real user rating values. For each pairs of rating prediction <pi, qi>, the MAE treats the absolute error between them. The sum of all these absolute errors for N rating-prediction pairs is calculated. Then, the average will be found for MAE:

| |

The lower the MAE is, the better prediction quality the recommendation system is. Besides, Root Mean Squared Error (RMSE) and Correlation are also considered as metric in this class.

The other class of metrics only considers the prediction is “good” or “bad”. But for consistency with the experiments in [5, 7, 12], we use MAE as our choice of evaluation metric to report our predictions.

C. Evaluation Platform Our system is implemented in C#. All our experiments

were tested on a Window 7 based laptop with Intel Core i5 and 2GB of RAM.

D. Evaluation Result

In this section, Table 1 shows the numerical results of our algorithm as well as comparisons with other state-of-the-art approaches. Our algorithm is able to get the lowest MAE in almost of all cases.

In assessing the quality of our predictions, we first determined some important parameters. There were the iteration I, the rate of approaching the minimum α, the latent features K, and the over-fitting parameter β. By assigning different values to these parameters to show how effect against the quality of predictions.

We applied the algorithm based on three values of K (10, 15, 20), two values of I (500, 10000), α (0.0001, 0.00001) and β = 0.002. The bigger value for K and I, the longer our algorithm would take to solve. In order to avoid running into the risk of missing the minimum and ending up around the minimum, we assigned small value to α. The β parameter was set to 0.002 to ensure that the updating rule to generate the prediction did not contain the large number.

After implementing our algorithm on all cases of MovieLens dataset above, we conducted some experiments on parameters to reach the best-known solutions for almost of cases, after 9 independent runs for each case. In our experiment, the best solutions for all case were when the parameter I is 5000 and α is 0.00001. It means that we do not need run too many iterations. In addition, according to our experiment, the smaller α parameter is, the faster we achieve the minimum. Moreover, the feature K is 10 for Given20, and 15 for Given5, Given10 in order to get best-known solutions. Therefore, with sparse dataset, we use bigger value of K than other data sets.

In table 1, we compared our proposed algorithm RNMF with other Collaborative filtering algorithms. All of them were listed in [6, 8, 12], i.e. CFONMTF stands for Collaborative Filtering for Orthogonal Nonnegative Matrix Tri-Factorization [6], SF2 for Similarity Fusion [14], SCBPCC for Cluster-Based Pearson Correlation Coefficient [8], AM for Aspect Model [22], PD for Personality Diagnosis [3], PCC for user-based Pearson Correlation Coefficient [13] and CBCF for Cluster-Based Collaborative Filtering [16].

TABLE I. MAE in this paper and other different algorithms in [6,

8, 12] (Small values are better than the others) Training

Set Algorithms Given5 Give10 Give20

ML_100

RNMF 0.826 0.778 0.754

CFONMTF 0.838 0.801 0.804

SF2 0.847 0.774 0.792

SCBPCC 0.848 0.819 0.789

CBCF 0.924 0.896 0.890

AM 0.963 0.922 0.887

PD 0.849 0.817 0.808

PCC 0.874 0.836 0.818

ML_200

RNMF 0.807 0.771 0.744

CFONMTF 0.827 0.791 0.787

SF2 0.827 0.773 0.783

SCBPCC 0.831 0.813 0.784

CBCF 0.908 0.879 0.852

AM 0.849 0.837 0.815

PD 0.836 0.815 0.792

PCC 0.859 0.829 0.813

ML_300

RNMF 0.794 0.765 0.735

CFONMTF 0.801 0.780 0.782

SF2 0.804 0.761 0.769

SCBPCC 0.822 0.810 0.778

CBCF 0.847 0.846 0.821

AM 0.820 0.822 0.796

PD 0.827 0.815 0.789

PCC 0.849 0.841 0.820

IV. CONCLUSIONS

The paper has presented a modified Regularized Non-negative Matrix Factorization (RNMF) for MovieLens dataset. The idea is to have a comprehensive and complete comparison between it and others methods that applied to MovieLens dataset. We then experimentally show that our RNMF outperforms other previous algorithms in term of the accuracy of the prediction. Moreover, we conduct that we should use bigger factors K for sparsity dataset than others and the matrix factorization is always get the best accuracy of prediction on it.

V. REFERENCES [1] Andreas Töscher, Michael Jahrer, and Robert Legenstein. 2008.

Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems. Proceeding NETFLIX '08 Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition ACM New York, NY, USA.

[2] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Collaborative Filtering Recommendation Algorithms. Proceeding WWW '01 Proceedings of the 10th international conference on World Wide Web ACM New York, NY, USA.

[3] D. M. Pennock, E. Horvitz, S. Lawrence, and C. L. Giles. 2000. Collaborative filtering by personality diagnosis: a hybirdmemory- and model-based approach. In Proc. of UAI.

[4] Gábor Takács, István Pilászy, and Bottyán Németh. 2008. Investigation of Various Matrix Factorization Methods for Large Recommender Systems. Proceeding NETFLIX '08 Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition ACM New York, NY, USA.

[5] Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. 2009. Scalable Collaborative Filtering Approaches for Large Recommender Systems. The Journal of Machine Learning Research Volume 10.

[6] Gang Chen, FeiWang, and Changshui Zhang. 2007. Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization. Omaha, USA.

[7] Gediminas Adomavicius, Alexander Tuzhilin. 2005. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering, 17:734–749.

[8] Gui-Rong Xue, Chenxi Lin, Qiang Yang, WenSi Xi, Hua-Jun Zeng, Yong Yu, and Zheng Chen. 2005. Scalable Collaborative Filtering Using Cluster-based Smoothing. Proceeding SIGIR '05 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval ACM New York, NY, USA.

[9] István Pilászy. 2009. Factorization-Based Large Scale Recommendation Algorithms. Doctoral Thesis. Budapest University of Technology and Economics, Hungary Department of Measurement and Information Systems.

[10] Jason D. M. Rennie and Nathan Srebro. 2005. Fast Maximum Margin Matrix Factorization for Collaborative Prediction. Proceeding ICML '05 Proceedings of the 22nd international conference on Machine learning ACM New York, NY, USA.

[11] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating Collaborative Filtering Recommender Systems, ACM Transactions on Information Systems 22 (1), 5-53.

[12] Joseph A. Konstan, Loren G. Terveen, John T. Riedl, and Jonathan L. Herlocker. 2004. Evaluating Collaborative Recommender Systems. Journal ACM Transactions on Information Systems (TOIS) Volume 22 Issue 1ACM New York, NY, USA.

[13] J. S. Breese, D. Heckerman, and C. Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. of UAI.

[14] Jun Wang, Arjen P. de Vries, and Marcel J.T. Reinders. 2006. Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 06.

[15] Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, and Jaime G. Carbonell. 2010. Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. Proceeding RecSys '10 Proceedings of the fourth ACM conference on Recommender systems ACM New York, NY, USA.

[16] L. H. Ungar and D. P. Foster. 1998. Clustering methods for collaborative filtering. In Proc. Workshop on Recommendation Systems at AAAI, Menlo Park, CA. AAAI Press.

[17] Markus Weimer, Alexandros Karatzoglou, Quoc Viet Le, and Alex Smola. 2007. Neural Information Processing Systems (NIPS). Vancouver, Canada, December 3-8.

[18] Robin Burke. 2002. Hybrid Recommender Systems: Survey and Experiments. Kluwer Academic Publishers.

[19] Robin Burke. 2007. Hybrid Web Recommender Systems, Springer-Verlag, Berlin Heidelberg.

[20] Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20 (NIPS 2007). MIT Press, Cambridge, Massachusetts, USA.

[21] Schafer, J. B., Konstan and Riedl. 1999. Recommender Systems in Ecommerce, Proceedings of the ACM 1999 Conference on Electronic Commerce.

[22] T. Hofmann and J. Puzicha. 1999. Latent class models for collaborative filtering. In Proc. of IJCAI.

[23] Yehuda Koren. 2008. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. KDD08, LasVegas, Nevada,USA.

[24] Yehuda Koren. 2010. Factor in the Neighbors: Scalable and Accurate Collaborative Filtering. Journal ACM Transactions on Knowledge Discovery from Data (TKDD) Volume 4 Issue 1, ACM New York, NY, USA.

[ieee communication technologies, research, innovation, and vision for the future (rivf) - ho chi...

Documents