[ieee communication technologies, research, innovation, and vision for the future (rivf) - hanoi,...

4
Active Learning for Co-clustering Based Collaborative Filtering Le Quang Thang Department of Computer Science Posts & Telecom. Institute of Technology, Vietnam [email protected] Tu Minh Phuong Department of Computer Science Posts & Telecom. Institute of Technology, Vietnam [email protected] Abstract—Collaborative filtering, a technique for making predictions about user preferences by exploiting behavior patterns of groups of users, has become a main prediction technique in recommender systems. One crucial problem for collaborative filtering algorithms is how best to know about the preferences of a new user, who has rated none or few examples. Active learning provides effective strategies to select the most informative ratings though minimum interaction with new users. In this paper, we present a new method for actively acquiring ratings from new users. Using a co-clustering based collaborative filtering framework, we propose combining expected value of rating information with likelihood of getting ratings from the users to form the sample selection criterion. Empirical studies with two datasets of movie ratings show that the proposed method outperforms three popular active learning strategies for collaborative filtering. Keywords-collaborative filtering, active learning, co-clustering. I. INTRODUCTION Recommender systems have become important resources that provide help for people in finding relevant products, services, and information. Collaborative filtering (CF) is a main technique to make prediction about user preferences in recommender systems. Given previously observed user-item interactions in form of explicit of implicit opinions (ratings), a CF algorithm identifies groups of users with similar preferences and produces recommendations based on the opinions of the users from the same group with the active user. There are two major categories of CF approaches: memory- based and model-based. The memory-based approaches just store all rating data and use K-nearest neighbor algorithms to make predictions. The model-based techniques use the rating data to create models of user-item interactions, which the system uses to generate predictions for the active user [2]. In general, the prediction accuracy of a CF system depends on the number of known ratings of the active user. When new users enter the system, there is no rating available from them. This is known as the new user problem and presents great challenge for CF algorithms. In order to make personalized predictions, the system must acquire some information about the new user, for example by presenting items to the user for rating. However, acquiring a large number of ratings is difficult because typical users are not willing to rate many items or because they have not seen the items presented by the system. Active learning provides a solution to this problem by acquiring ratings that will be most useful in training the user’s model, thereby reduces the number of ratings to solicit. The key question in designing an active learning method is what constitutes the most useful training examples [16]. A commonly used strategy is to select examples which will result in maximum reduction of model uncertainty. Another popular approach selects examples that will lead to maximum reduction of prediction errors [9]. Active learning has been extensively studied for the classification settings, where a usual assumption is that the system can obtain label for any selected unlabeled example. For CF, this assumption in no longer correct since a user is not always able to rate an item, for example if he/she has not seen it. As pointed out in [4, 11], this makes an additional requirement for active learning algorithms when applied to CF. There are a number of works on applying active learning to both memory-based and model-based CF. Rashid et al. [12] explored several ways to choose informative items for K-NN based CF. Also extending the memory-based framework, Huang [8] proposed to proactively select items which will be beneficial for all the users at the same time. A more variety of active learning extensions have been proposed for model-based CF. In [1], Boutilier et al. used Multiple Cause Vector Quantification model for collaborative prediction. Based on the currently estimated model, they computed expected value of information and used it to actively select items for rating. Based on the Aspect model, Jin and Si [9] applied a full Bayesian treatment by averaging over posterior distribution of all predicted models. In [13], Rish and Tesauro proposed an active learning extension for Max-Margin Matrix Factorization based CF. They elaborated on the idea of active support vector learning, which chooses the sample with minimum margin. In this paper, we consider a new active learning method that explicitly combines the usefulness of ratings to solicit with the likelihood of getting the ratings from a new user. We choose the co-clustering approach presented in [4] as the CF framework and use the expected information value as the criterion to evaluate the usefulness of rating an item. The key idea is to use the Mahalanobis distance to approximate the membership to different user clusters and average over possible rating values. We use movie rating datasets to evaluate and compare the performance of the proposed active learning method with other baselines. 978-1-4244-8075-3/10/$26.00 ©2010 IEEE

Upload: minh-phuong

Post on 06-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

Active Learning for Co-clustering Based Collaborative Filtering

Le Quang Thang Department of Computer Science

Posts & Telecom. Institute of Technology, Vietnam [email protected]

Tu Minh Phuong Department of Computer Science

Posts & Telecom. Institute of Technology, Vietnam [email protected]

Abstract—Collaborative filtering, a technique for making predictions about user preferences by exploiting behavior patterns of groups of users, has become a main prediction technique in recommender systems. One crucial problem for collaborative filtering algorithms is how best to know about the preferences of a new user, who has rated none or few examples. Active learning provides effective strategies to select the most informative ratings though minimum interaction with new users. In this paper, we present a new method for actively acquiring ratings from new users. Using a co-clustering based collaborative filtering framework, we propose combining expected value of rating information with likelihood of getting ratings from the users to form the sample selection criterion. Empirical studies with two datasets of movie ratings show that the proposed method outperforms three popular active learning strategies for collaborative filtering.

Keywords-collaborative filtering, active learning, co-clustering.

I. INTRODUCTION

Recommender systems have become important resources that provide help for people in finding relevant products, services, and information. Collaborative filtering (CF) is a main technique to make prediction about user preferences in recommender systems. Given previously observed user-item interactions in form of explicit of implicit opinions (ratings), a CF algorithm identifies groups of users with similar preferences and produces recommendations based on the opinions of the users from the same group with the active user. There are two major categories of CF approaches: memory-based and model-based. The memory-based approaches just store all rating data and use K-nearest neighbor algorithms to make predictions. The model-based techniques use the rating data to create models of user-item interactions, which the system uses to generate predictions for the active user [2].

In general, the prediction accuracy of a CF system depends on the number of known ratings of the active user. When new users enter the system, there is no rating available from them. This is known as the new user problem and presents great challenge for CF algorithms. In order to make personalized predictions, the system must acquire some information about the new user, for example by presenting items to the user for rating. However, acquiring a large number of ratings is difficult because typical users are not willing to rate many items or because they have not seen the items presented by the system.

Active learning provides a solution to this problem by acquiring ratings that will be most useful in training the user’s model, thereby reduces the number of ratings to solicit. The key question in designing an active learning method is what constitutes the most useful training examples [16]. A commonly used strategy is to select examples which will result in maximum reduction of model uncertainty. Another popular approach selects examples that will lead to maximum reduction of prediction errors [9].

Active learning has been extensively studied for the classification settings, where a usual assumption is that the system can obtain label for any selected unlabeled example. For CF, this assumption in no longer correct since a user is not always able to rate an item, for example if he/she has not seen it. As pointed out in [4, 11], this makes an additional requirement for active learning algorithms when applied to CF.

There are a number of works on applying active learning to both memory-based and model-based CF. Rashid et al. [12] explored several ways to choose informative items for K-NN based CF. Also extending the memory-based framework, Huang [8] proposed to proactively select items which will be beneficial for all the users at the same time. A more variety of active learning extensions have been proposed for model-based CF. In [1], Boutilier et al. used Multiple Cause Vector Quantification model for collaborative prediction. Based on the currently estimated model, they computed expected value of information and used it to actively select items for rating. Based on the Aspect model, Jin and Si [9] applied a full Bayesian treatment by averaging over posterior distribution of all predicted models. In [13], Rish and Tesauro proposed an active learning extension for Max-Margin Matrix Factorization based CF. They elaborated on the idea of active support vector learning, which chooses the sample with minimum margin.

In this paper, we consider a new active learning method that explicitly combines the usefulness of ratings to solicit with the likelihood of getting the ratings from a new user. We choose the co-clustering approach presented in [4] as the CF framework and use the expected information value as the criterion to evaluate the usefulness of rating an item. The key idea is to use the Mahalanobis distance to approximate the membership to different user clusters and average over possible rating values. We use movie rating datasets to evaluate and compare the performance of the proposed active learning method with other baselines.

978-1-4244-8075-3/10/$26.00 ©2010 IEEE

Page 2: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

II. ACTIVE LEARNING FOR CO-CLUSTERING BASED COLLABORATIVE FILTERING

A. Co-clustering based collaborative filtering Let U = {u1, u2, …, um} be a set of users, T = {t1, t2, …, tn}

be a set of items or products. Let R denote the rating matrix of size m x n such that each element rij is the rating of user ui for item tj. rij can take on a value from a finite set {s1, s2,…, sK}, or is set to ∅ if the rating is unknown. The problem is to predict the unknown ratings of rating matrix R and use them to make top-N recommendations for an active user ua. As mentioned above, most CF algorithms fall into two categories: model-based methods and memory-based methods. Here we focus on model-based CF, especially matrix approximation approaches.

The basic idea of matrix approximation is to exploit the sparseness of the matrix and assume that the matrix has a certain hidden structure of low parameters. Once the parameters are found by minimizing a loss function, the structure can be used to reconstruct the matrix and predict missing elements [4, 7, 10, 14, 18].

In this work, we will focus on the co-clustering based matrix approximation framework presented in [4], as it gives competitive prediction accuracy while is more computationally efficient than other matrix approximation methods. Informally, CF via co-clustering is a model-based CF method that seeks a low parameter approximation of the rating matrix by simultaneously clustering users and items in the matrix. For the completeness of the paper we now briefly describe the co-clustering based CF method. The reader is referred to the original paper for more details.

Let CU: {1,…,m} →{1,…,kU} and CT:{1,…,n} → {1,…,kT}denote the user and item clustering respectively. Let δij be an indicator such that δij=1 when rating rij is known and δij=0 when rij is unknown. The approximate matrix R’ is given by:

)()(' CThj

CUgi

CCghij rrrrrr −+−+= (1)

where g = CU (i), h = CT (j), ir , jr are the average ratings

of user ui and item tj respectively, and CCghr , CU

gr , CThr are

the average ratings of co-cluster, user cluster and item cluster respectively.

In the training phase, the algorithm seeks to find the optimal clustering assignments CU, CT such that the squared error of R’ with respect to known ratings of is minimized, i.e.:

−== =

m

i

n

jijijij

CCTU rrCC

TU 1 1

2

),()'(minarg),( δ (2)

An efficient iterative algorithm has been derived to find a locally optimal solution to this problem [4].

In the prediction phase, to predict unknown rating for a pair of user ui and item tj, the algorithm considers four cases. If both the user and the item are not new to the system, the rating is calculated by (1). If the user is an existing one and the item is new, the predicted value is the user average, i.e. r’ij = ir . If the item is an existing one and the user is new, the predicted value

is the item average, i.e. r’ij = jr . When both the user and the item are new, the algorithm just returns the global average of all known ratings as the predicted value.

B. Active rating acquisition method The prediction accuracy of the co-clustering based method

largely depends on whether the algorithm can find the correct (co)cluster assignment for the requested user-item pair. If either of the user or item is new, the algorithm can only make a rough prediction due to lack of information about correct cluster assignment. Consider the case of new users. Since the system has very few ratings from such users, it is impossible to find accurate cluster assignments for them. Thus, the goal of active learning is to obtain more ratings from the new user so that the new ratings provide as much information as possible for finding the correct cluster for the user. Here we focus on active learning for new users. However, because co-clustering is symmetric, the method is also applicable for new items.

A popular policy of active learning is to solicit ratings for items which minimize the uncertainty of the model. In the context of the clustering framework, this policy leads to acquiring item ratings that allow the most certain (confident) assignment of the new user to one of user clusters. Thus, we estimate the value of potential items by their expected contributions to improving the certainty of clustering assignment. Since the true value of a rating is not known before it is given by the user, we need to estimate the contributions of knowing the rating of an item for all possible acquisition outcomes. Assuming that each item can receive one of Kdistinct rating values s1, …, sK , the expected value of the query qij for rating from user ui for item tj is computed as:

====

K

kkijkijij srPsrvqE

1)()()( (3)

where v(rij = sk) is the value or utility of knowing that the value of rating rij is sk, and P(rij = sk) is the probability that rijhas value sk. In the absence of any prior information about user ui, we compute the value of P(rij = sk) by averaging over all available ratings. Using the notation introduced in the previous section, P(rij = sk) is computed as:

=

==

== m

i ji

m

i kjikij

srsrP

1' '

1' ' )()(

δ

1 (4)

where 1(ri’j = sk) = 1 if ri’j = sk and 1(ri’j = sk) = 0 otherwise.

The value v(rij = sk) is estimated using an utility function, the objective of which is to minimize the uncertainty in choosing a cluster for the user when knowing rij. Given only item tj, the clustering algorithm will assign user ui to a cluster, the average rating of which for tj is closest to rij. Because the ratings in each cluster for item tj form a distribution, we use the Mahalanobis distance as the distance metric between rij and the clusters. The univaritate version of Mahalanobis distance from rij to cluster g is the standard z-score, which is computed as:

CUgj

CUgjijji rrgrd σ/||),( −= (5)

Page 3: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

where CUgjr is average rating of item j for user-cluster g,

and CUgjσ is the standard deviation.

The assignment is made to the cluster with smallest distance d computed by (5). The least uncertain assignment can be achieved when rij is close to one cluster and far from the others. We measure the confidence in cluster assignment as the difference between the distances from rij to the first and second closest cluster, which is ),(),( 12 grdgrd ijij − , where g1 and g2

are the first and second closest clusters. Note that when ),(),( 12 grdgrd ijij − increases the uncertainty decreases.

The distance to a cluster can be estimated with high accuracy only if there are enough ratings for the given item in the cluster. If this is not satisfied, which is quite common in practice due to data sparseness, the estimated distance is not a reliable measure. Thus, it is desired to consider only items with enough number of known ratings. Furthermore, in collaborative filtering practice, it is unrealistic to assume that users can provide ratings for any queried items. Thus, in addition to selecting items which will reduce the uncertainty in cluster assignment, it is also necessary to select items for which a user can provide ratings with a high probability. We approximate the probability P(tj) that an item tj will get a rating from any user as follows:

= =

== m

i

n

j ij

m

i ijjtP

1 1'

1)(δ

δ (6)

Combining the two ingredients above, we estimate v(rij = sk) as: v(rij = sk) = )()),(),(( 12 jijij tPgrdgrd − (7)

For each user ui, the active leaning method sorts the items in descending order of E(qij) values. Then, the first l items will be selected for recommendations. This policy therefore corresponds to selecting the items which will result in the most confident cluster assignment in expectation and at the same time have high probabilities of getting ratings from the user.

III. EXPERIMENTS

A. Experimental setup We used MovieLens (http://www.grouplens.org/data) and

MovieRating (http:// www.cs.usyd.edu.au/~irena/ movie_data.zip) datasets for the empirical analysis. MovieLens dataset consists of 100000 ratings by 943 users for 1682 movies. MovieRating consists of 43850 ratings by 500 users for 1000 movies. For both datasets, ratings are in 1-5 scale. For each dataset we created five random training-test splits and averaged the results over the splits. Each training set from MovieLens contains 600 users and each training set from MovieRating contains 300 users. For each test user, we reserved 20 rated movies (called “evaluation set”) for evaluating the prediction accuracy of active learning methods. The remaining movies form the active selection set, from which an active learning algorithm selects movies for rating. We followed the setting in [5], where the active selection set

for each test user consist of all remaining rated and unrated movies after reserving 20 movies to form the evaluation set.

The active learning cycle is as follows. First, the training user set was used to build the initial global co-clustering model. Next, for each test user, an active learning cycle starts. In each iteration, the system selects an item for rating from the active selection set of the current test user. If the requested rating is available in the database, the system adds the rating to the set of the user’s known ratings and rebuilds the co-clustering model. If the requested rating is not available, a selection failure occurs and no update is made to the database and co-clustering model. The system proceeds to the next iteration and the cycle continues. We used ten active iterations to evaluate each of active learning algorithms.

To analyze the performance of our active learning method, we used two kinds of evaluation metrics.

Prediction accuracy. The prediction accuracy was measured by Mean Absolute Error (MAE). Let Utest denote the set of test users, Tieval denote the evaluation test for test user ui . r’ij, is the predicted value of rij, MAE is defined as follows:

−=∈ ∈test ievalUi Tj

ijijievaltest

rrTU

MAE '||

1||

1 (8)

Active selection failures. It may happen that the system requests ratings for movies and the user cannot provide ratings for all of them. Such cases are known as failures [6] and should be avoided. In our experiment, a failure occurs when an algorithm requests a rating that is not contained in the dataset.

The proposed algorithm was compared against the following algorithms for active selection of items.

Random Selection (random). This method randomly selects one item from the active selection set for user’s feedback.

Popularity based Selection (popularity). Soliciting rating for items that have been rated by many users was reported to have competitive performance [12]. This method actively selects the item that has the largest number of known ratings.

Entropy based Sample Selection (entropy). This is a popular approach for active learning [5]. It selects the item that accelerates the reduction of uncertainty in cluster assignment.

B. Results The MAE values for the proposed method and three

baseline methods on MovieLens ans MovieRating datasets are presented in Fig. 1 and 2 respectively. The average numbers of selection failures on the two datasets are given in Table 1.

According to Fig. 1 and 2, the entropy based algorithm has the worst prediction accuracy. An explanation of this phenomenon is that many items with very few ratings may have high entropy and thus are selected for soliciting ratings. However, it is not reliable to compute entropy from a small number of ratings. Moreover, such un-popular items will have low probability of getting feedback from the active user.

The popularity based method performs much better than the entropy based and random methods in terms of prediction

Page 4: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

accuracy. Table 1 also shows that the popularity based method has the lowest number of failures among all the four tested algorithms. It is not unexpected since the method favors popular items, which clearly have higher chance of getting feedback from users.

Figure 1. MAE values of four active learning on MovieLens dataset (averaged over five training-test splits).

Figure 2. MAE values of four active learning on MovieRating dataset (averaged over five training-test splits).

TABLE 1. NUMBERS OF SELECTION FAILURES OF FOUR ACTIVE LEARING ALGORITHM (AVERAGED OVER ALL TEST USERS AND 10 ACTIVE SELECTIONS)

MovieLens MovieRatingRandom 9.19 9.05 Popularity 5.12 5.26 Entropy 9.77 9.62 Active co-clustering 5.51 5.6

Another observation from Fig. 1 and 2 is that our proposed method, which we call Active Co-clustering or Active CC for short, consistently has the best performance in terms of mean absolute error on both datasets. In absence of any prior rating from new users, the method was able to actively select items, which lead to significant reduction of MAE. The proposed method also has low failure rate, which is close to the failure rate of popularity based active selection algorithm. This is achieved by incorporating the estimated probability of getting feedback from the active user to the selection criterion.

IV. CONCLUSION

We have presented an active learning method for collaborative filtering based on co-clustering. For a new user, the method selects the most potential items by combining two ingredients: the expected information values of the items and the likelihood of the user to rate those items. To compute expected information value, we use the distribution of Mahalanobis distances between possible items’ ratings and user clusters. Experiments on two benchmark dataset give strong evidence about the superior performance of our method in term of prediction accuracy over three baselines which are popular active learning frameworks for CF. The method can also be used for the new item problem, as well as for other clustering based collaborative filtering methods.

ACKNOWLEDGMENT

This work was supported by the National Foundation for Science and Technology Development of Vietnam.

REFERENCES

[1] C. Boutilier , R. Zemel, and B. Marlin. Active Collaborative Filtering. In Proc. of SIGIR 2003.

[2] J. S. Breese, D. Heckerman, and C. M. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” in Proc. of UAI, G. F. Cooper and S. Moral, Eds. Morgan Kaufmann, 1998, pp. 43–52.

[3] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: Scalable online collaborative filtering. In Proc. of WWW, 2007.

[4] T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In ICDM, 2005.

[5] A. Harpale and Y. Yang, “Personalized active learning for collaborativefiltering,” in Proc. of SIGIR 2008, pp. 91–98.

[6] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004.

[7] T. Hofmann, “Latent semantic models for collaborative filtering,” ACM Trans. Inf. Syst., vol. 22, no. 1, pp. 89–115, 2004.

[8] Z. Huang. Selectively acquiring ratings for product recommendations. ICEC 2007. pp 379-388.

[9] Jin, R. and Si, L. A Bayesian Approach toward Active Learning for Collaborative Filtering. In Proc. of UAI, 2004.

[10] R. Jin, L. Si, C. Zhai. A study of mixture models for collaborative filtering. Information retrieval. Volume 9(3), 2006, pp: 357-382.

[11] P. Melville, M. Saar-Tsechansky, F. Provost, R. Mooney. Active feature value acquisition for classifier induction. In Proc. of ICDM 2004. pp. 483-486.

[12] A. M. Rashid, I. Albert, D. Cosley, S. K. Lam, S. M. McNee, J. A. Konstan, J. Riedl: Getting to know you: learning new user preferences in recommender systems. In Proc. of IUI 2002: 127-134

[13] I. Rish, G. Tesauro. Active collaborative prediction with maximum margin matrix factorization. Inform. theory and app. workshop. 2007.

[14] I. Sampaio, G. Ramalho, V. Corruble, R. Prudencio. Acquiring the preferences of new users in recommender systems: the role of item controversy. In ECAI 2006 workshop on recommender systems. 2006.

[15] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental SVD-based algorithms for highly scaleable recommender systems. In Proc. of the 5th Intl. Conf. on Computer and Information Technology, 2002.

[16] B. Settles. Active learning literature survey. 2009. http://pages.cs.wisc.edu/~bsettles/active-learning/