a context aware group recommendation ... - folk.idi.ntnu.no filetdt4501 computer science,...

TDT4501 Computer Science, Specialization

Project

Autumn 2015

A Context Aware GroupRecommendation System

for movies

Tan Quach Le

Supervisor: John Krogstie

Abstract

A group of people wants to watch a movie. They have their own tastes anddo not know what movie would fit the group best. This project introducesalgorithms and methods to solve this issue. A prototype has been developedto test those methods in practice. The prototype uses collaborative filteringand average aggregation to predict ratings and recommend movies. The resultsshow challenges in group recommendation and more research is needed to findbetter solution.

2

Contents

Abstract 2

1 Introduction 7

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Report structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background 9

2.1 Recommendation systems . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Content based filtering . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Collaborative filtering . . . . . . . . . . . . . . . . . . . . 10

2.3 Context-aware Recommendation Systems . . . . . . . . . . . . . 13

2.4 Group recommendation . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 Average Aggregation . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Least misery Aggregation . . . . . . . . . . . . . . . . . . 14

2.4.3 Group disagreement . . . . . . . . . . . . . . . . . . . . . 14

2.4.4 Consensus Function . . . . . . . . . . . . . . . . . . . . . 15

3 Research method 16

3.1 Design Science Research . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 Design as an Artifact . . . . . . . . . . . . . . . . . . . . 18

3.1.2 Problem relevance . . . . . . . . . . . . . . . . . . . . . . 18

3.1.3 Design Evaluation . . . . . . . . . . . . . . . . . . . . . . 18

3.1.4 Research Contribution . . . . . . . . . . . . . . . . . . . . 18

3.1.5 Research Rigor . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.6 Design as a Search Process . . . . . . . . . . . . . . . . . 18

3.1.7 Communication of Research . . . . . . . . . . . . . . . . . 18

3.2 Evaluation tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Mean absolute error . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Python time module . . . . . . . . . . . . . . . . . . . . . 19

3

4 CONTENTS

4 Implementation 204.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.1 Scenario: . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.1.2 Requirements: . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Implementation approach . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 Set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Defining context . . . . . . . . . . . . . . . . . . . . . . . 214.3.3 Individual recommendation . . . . . . . . . . . . . . . . . 224.3.4 Group recommendation . . . . . . . . . . . . . . . . . . . 244.3.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Evaluation 275.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.1 Data inserting . . . . . . . . . . . . . . . . . . . . . . . . 275.1.2 Recommendation . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2.1 Individual recommendations . . . . . . . . . . . . . . . . . 285.2.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2.3 Group recommendation . . . . . . . . . . . . . . . . . . . 30

6 Discussion 316.1 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Individual recommendation . . . . . . . . . . . . . . . . . . . . . 316.3 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.4 Group recommendation . . . . . . . . . . . . . . . . . . . . . . . 32

7 Conclusion and future work 33

Appendices 34

A Code 35

B Proof for rewritten Pearson correlation 36

Bibliography 38

List of Figures

3.1 Design-Science Research Guidelines, figure taken from [4]. . . . 17

4.1 RE model of the database . . . . . . . . . . . . . . . . . . . . . . 214.2 Django admin interface . . . . . . . . . . . . . . . . . . . . . . . 214.3 Screenshot of how a user can add genres to his ”tired of genre” list. 224.4 Dictionary of the ratings . . . . . . . . . . . . . . . . . . . . . . . 23

5.1 Number of seconds to insert the data . . . . . . . . . . . . . . . . 275.2 Running time, without data in memory . . . . . . . . . . . . . . 285.3 Running time, with data in memory . . . . . . . . . . . . . . . . 28

5

List of Tables

5.1 Person A and Person B ratings. X means not rated . . . . . . . . 285.2 Person A and Person B ratings. Similarity threshold:0.0 . . . . . 295.3 Person A and Person B ratings. Similarity threshold:0.2 . . . . . 295.4 Person A and Person B ratings. Similarity threshold:0.5 . . . . . 29

6

Chapter 1

Introduction

Have you ever wondered what movie to see, or what book to read? A recom-mender system is a system that recommends items to the user of the system.Recommender systems provide advice to users about items they might wish topurchase or examine. Recommendations made by such systems can help usersnavigate through large information spaces of product descriptions, news articlesor other items [9]. The importance of context and contextualized user data foraccurate recommendations has been widely recognized. However, the vast ma-jority of existing recommendation techniques focus on recommending the mostrelevant items to users and do not take the context into consideration[8]. Acontext aware group recommender system adds context to the recommendationand recommends items to a group of people instead of just one person, whichmakes it a little more difficult. In order to recommend something to a group,the system needs to take into account that people have different taste, and thesystem has to find the item that would make the group as happy as possible.This means that it has to define what group happiness means. Does it meanfinding a product that will give highest average happiness among the group?Or maybe a product that no one dislike. This project is going to discuss thismatter and compare different methods to recommend items to groups.

This chapter is going to talk about the motivation for the research, the problemsthis project is going to discuss and finally the structure of the report.

1.1 Motivation

Recommendation Systems has been an important research area since the mid-1990s [1]. Probably not shocking when thinking about how important it is formany companies. Amazon, Netflix, Ebay and Last.fm, just to mention a fewcompanies that utilize recommendation systems. Netflix even had a competitioncalled ”Netflix Prize”, which was an open competition for the best collaborativefiltering algorithm to predict user ratings for films. However, all of them recom-

7

8 CHAPTER 1. INTRODUCTION

mend items for a single user. There are less, if any, known systems in existencethat recommend items to groups of people.

1.2 Contribution

This section describes the main contribution from this project.

• State of the art. A review of existing methods to recommend items.

• Prototype A prototype to test recommendation algorithms.

• Evaluation Evaluation of the performance and the accuracy of the rec-ommendations.

1.3 Report structure

This report begins with a state of the art of the recommendation systems liter-ature in Chapter 2. This chapter contains individual recommendation Systems,group recommendation System, and context awareness. Chapter 3 describesthe methodology used for research and evaluation. Chapter 4 describes howthe prototype is implemented. Chapter 5 evaluates the prototype. Chapter 6discusses the evaluation and the prototype, before the conclusion is made inchapter 7.

Chapter 2

Background

This chapter is going through the research that is related to this project. Algo-rithms, challenges and so on, that is related to recommendation systems.

2.1 Recommendation systems

”The goal of a recommender system is to generate meaningful recommendationsto a collection of users for items or products that might interest them” [6]. Thisbenefits both the users and the providers of the items. The user gets what hewants, and the providers gets to sell something that the users might not haveknown about without the recommendation.

A rating consist of two things: a user, and an item. Which means that arethree categories within the recommendation system: items, users and the rat-ing.

2.2 Techniques

2.2.1 Content based filtering

An approach when designing recommendation systems is content-based filtering.Content-based filtering is about comparing an item’s description, like genre,with a user’s preference [7]. How this is done is by checking what items theusers already liked, create a model based on that and then try to find matchingitems. For example, if a user has liked a action movie with Tom Cruise, thenthe system would recommend another action movie with Tom Cruise. Thistechnique requires three step:

• Analyzing the content This step is about representing the content ofthe items. Somehow extracting the relevant information and then use thisas an input for the next step.

9

10 CHAPTER 2. BACKGROUND

• Learning the profile This step takes the data from the previous stepand creates a user profile based on that data.

• Filtering. This steps finds the items that matches the profile and recom-mends them.

Advantages

• Unlike collaborative filtering (see next section), a recommendation to auser, is not dependent of the other users. The system only looks at theuser’s rating history, and not what other users have rated.

• It is easier to understand why a certain item has been recommended,because it can show what features that matches your liking. In collabo-rative filtering, an item is recommended to you, because ”some guy” whomathematically has the same taste as you, liked that item.

• When a new item arise, content-based filtering does not have any problemwith recommending it, as long as it features fits the user’s profile. Col-laborative filtering however, has a problem, since no users have rated it,the item will not be recommended to anyone. Someone has to like it first,before it can be recommended.

Drawbacks

• For content-based filtering to be effective, the content need to have enoughinformation, so the system can discriminate the items user likes and doesnot like.

• Since it only recommends similar items that a user has liked, the rec-ommendation will never be a surprise. The recommendation is very pre-dictable. If a user likes an action movie, only action movies will be rec-ommended. He might also like comedy movies, but they will never getrecommended, unless he sees one and likes it.

• Since the recommendation is based on a user’s history, then making arecommendation for a new user without any history will be problematic.

2.2.2 Collaborative filtering

Instead of just looking at the user’s own rating history, the system can also lookat all rating history, to make a recommendation, this is called collaborativefiltering. This section will explain different methods of collaborative filtering.[5, p. 13-23]

2.2. TECHNIQUES 11

Memory based

Collaborative filtering is divided in two catagories: memory based and modelbased. Memory based approaches uses the past ratings to predict the ratings.Here are two examples of methods of memory based approach:

User-based The first approach is called the user-based nearest neighbor rec-ommendation. The idea is to find the K users that is most similar to the user,and look at what they have rated a particular item, and then predict the ratingbased on that to the user. The challenge here is to find a good value for K.In other words, how many similar users, should the system take into accountwhen predicting a rating for an item? If K is too high, then too many userswith limited similarity are considered, which might add ”noise” to the predic-tion. However, if K is too low, then the system has too few users to comparewith, and the quality of the prediction might not be as good as one might want.Another way to do this is to choose a minimum similarity threshold, which willignore ignore users that has lower similarity value then the threshold. However,a similar problem still exist. That is, finding a good value for the threshold.

There are two common ways to calculate the similarities of two users: Cosine-based and Pearson correlation-based. Cosine-based regards two users, as twovectors, and then calculate the similarity between user u and v by calculatingthe cosine of the angle between the vectors:

simUu,v = cos(~u,~v) =~u · ~v

|~u|2 × |~v|2=

∑ki=1 ru,irv,i√∑k

i=1 r2u,i

√∑ki=1 r

2v,i

(2.1)

Pearson correlation based finds similiraties between users by caluclating thePearson correlation, which is defined by:

simUu,v = corru,v =

∑i∈Iu,v

(ru,i − ru)(rv,i − rv)√∑i∈Iu,v

(ru,i − ru)2√∑

i∈Iu,v(rv,i − rv)2

(2.2)

Where ra means the average rating that the user a has given among all theratings user a has given.

Finally, the prediction for the rating of item i by user u can be calculatedby using this formula:

pred(u, i) = ra +

∑b∈N sim(a, b)(rv,i − rv)∑

b∈N sim(a, b)(2.3)

Where N is the nearest neighbors.

Item-based Instead of looking at similar users, the system can look at similaritems. One can use the Pearson correlation to calculate the similarity sim(i,j)between items i and j, however it has been reported that the cosine similarity


measure consistently outperforms the Pearson correlation metric [5]. However,using the basic cosine measure (formula 2.1), does not take the differences in theaverage rating behavior of the users into account. This can solved by adjustedcosine measure, which take this into account:

simUi,j =

∑u∈U (ru,a − ru)(ru,b − ru)√∑

u∈U (ru,i − ru)2√∑

u∈U (ru,b − ru)2(2.4)

Finally, we can predict the rating for user u for an item p as follows:

pred(u, p) =

∑i∈ratedItems(u) sim(i, p) ∗ (ru,i)∑

i∈ratedItems(u) sim(i, p)(2.5)

Model based

Model based approach uses training data to create a model. When the modelis done, the system no longer uses the ratings to calculate, but only the model.However creating this model can take a lot of time, if the database contains tenmillions of users and ten millions of items. Here are some examples of modelbased approaches:

Matrix factorization/latent factor models

In few words the idea of matrix factorization is to find latent (hidden) factorsbased on the rating pattern to characterize items and users. This section willtalk about the SVD (Singular Value Decomposition), which is one method ofmatrix factorization.

SVD

In linear algebra, singular value decomposition is a factorization of a real orcomplex matrix. This can be used to predict the users rating. The theoremstates that: Suppose M is a m × n matrix, then there exists a factorization ofM of the form:

M = UΣV T (2.6)

where Σ is a m × n diagonal matrix with non-negative real numbers onthe diagonal, and U is an m × m, and V’ is an n × n matrix. However wecan approximate the full matrix by observing only the most important features,hence picking Σ to only contain the k largest singular values of M. MatrixV corresponds to the users and matrix U to the items. By projecting thesematrices into a two-dimensional space, one can see what items are similar, andwhich users are similar. To predict user A’s rating, one must first find where itwould be in the two-dimensional space. This is done by multiplying A’s ratingvector by the k subset of U and the inverse of the k-column singular value matrix.

A2D = A× Uk × Σ−1k (2.7)

2.3. CONTEXT-AWARE RECOMMENDATION SYSTEMS 13

After finding A’s data point, one can now use different methods to predict A’srating. One can for instance use the cosine similarity as described earlier, usingthe users that is closest to A.

2.3 Context-aware Recommendation Systems

Context is important for several recommendation system, and it is a factorthis project is going to take into account for, but first we need to define whatcontext is. This project is going denote it as ”situation parameters that canbe known by the system and may have an impact on the selection and rankingof recommendation results” [5]. In the previous sections, we have only beenlooking at predicting ratings based on the users and items.

R : User × Items =⇒ Rating (2.8)

Hence, we have only been looking at two-dimensional (2D) systems since theyconsider only the User and the Item dimensions in the recommendation process.This project however, want to take context into consideration as well.

R : User × Items× Context =⇒ Rating (2.9)

For example, let’s say Bob wants to go to a concert during ”UKA”, which isthe largest cultural festival in Norway. Not only do we ask the question ”Howmuch is this user going to like the different concert”, we also want to ask ”when,where, is it raining? And so on.

2.4 Group recommendation

In previous section, we have been talking about recommendation for a singleuser. However, there exist situation where there is a need of recommendationfor groups. Recommendation for a group has two major challenges in compar-ison to individual user recommendations. The first challenge is how to definethe semantics of group recommendation[2]. Friends or not, people in a grouphave different taste, and this disagreement must be resolved. This is solvedby calculating a consensus function, which has two components, relevance anddisagreement. Challenge two is how how to efficiently compute group recom-mendations given a consensus function. Given the sorted relevance lists of twousers, the algorithm cannot predict their disagreement over unseen items. Thissection is going to discuss this matter. There are two approaches to generategroup recommendations: preference aggregation, which creates a ”virtual user”,that represents the whole group, and then generate recommendations to thatvirtual user; score aggregation, which first generate each member’s individualrecommendations and then ”merge” them to produce a single list. This sectionis going to focus on the latter approach. There are several methods to calculatethe score, we are going to focus on two of them: average and least misery.


2.4.1 Average Aggregation

Average aggregation calculate the average of the individual ratings for an item.The preference of a group G for an item i, denoted by rel(G,i) is therefore:

rel(G, i) =

∑u∈G relevance(u, i)

|G|(2.10)

However, this method might result in high scores for movies where some ofthem really like, while some others might really hate.

2.4.2 Least misery Aggregation

This method uses the minimum of the individual ratings as the groups rating.What it is try to achieve is to find the item, that one no one hates. Thepreference of a group G for an item i, denoted by rel(G,i) is therefore:

rel(G, i) = minu∈G(relevance(u, i)) (2.11)

The consequence of this method is that the system might miss a movie,which is expected to be disliked by one person, but loved by the rest of thegroup. As one can see, none of these methods really solves the disagreement.

2.4.3 Group disagreement

We have now given two ways to find the relevance of an item ( how much agroup likes the item), now we want to describe two methods that computes thedisagreement dis(G,i) related to item i, among group G :

• Average Pair-wise Disagreements:

dis(G, i) =2

|G|(|G| − 1)

∑u,v∈G

(|relevance(u, i)− relevance(v, i)|), (2.12)

where v 6= u.

• Disagreement Variance:

dis(G, i) =1

|G|∑u∈G

(relevance(u, i)−mean)2, (2.13)

where mean is the mean of all the individual relevance for the item.

The difference between those two methods is that the former method goesthrough all members and calculates the difference between the relevance amongtwo and two members and calculates the average of it. The latter methodcalculates the variance of the relevance for an item.

2.4. GROUP RECOMMENDATION 15

2.4.4 Consensus Function

Now that the group relevance and disagreement are introduced, we can finallycompute the consensus function F(G,i).

F (G, i) = w1 × rel(G, i) + w2 × (1− dis(G, i)), (2.14)

where w1 + w2 = 1.

Chapter 3

Research method

3.1 Design Science Research

Design science research is a set of analytical techniques and perspectives forperforming research in Information Systems. In few words it involves creatingand artifact, and evaluate it. Hevner et al.[4] gives us seven guidelines (figure3.1), which is an approach this project is going to follow.

16

3.1. DESIGN SCIENCE RESEARCH 17

Figure 3.1: Design-Science ResearchGuidelines, figure taken from [4].

18 CHAPTER 3. RESEARCH METHOD

3.1.1 Design as an Artifact

In this project an artifact will be made. A first prototype of a context-awaregroup recommendation system will be created. However, the GUI will not beprioritized.

3.1.2 Problem relevance

A group of people who want to find something to watch, they might even havesome ideas, but they can not agree on what to watch, because they have differentopinions. A group recommendation system can help them with this problem.

3.1.3 Design Evaluation

The system will be evaluated by it is recommendations, how precise they are andhow fast the recommendation are made. More concrete about this in chapter3.2.

3.1.4 Research Contribution

All of the algorithms are explained, both the algorithms themselves and howthey are implemented.

3.1.5 Research Rigor

This project explains how the algorithms is implemented, thus it should bepossible to reproduce the results. The code is also sent along with this report.

3.1.6 Design as a Search Process

Design as a search process motives us to use a iterative method. In this project,the speed of the recommendation has been tested multiple times, and a betterimplementation has been made for each time to make the speed faster.

3.1.7 Communication of Research

This project contains both how the algorithms is implemented in detailed anda more general how the prototype works for instance by drawings and pictures.

3.2 Evaluation tools

3.2.1 Mean absolute error

To calculate the accuracy of the predictions, the mean absolute error has beenused, since it is the most popular measure [5, p. 179]. MAE calculates the

3.2. EVALUATION TOOLS 19

average deviation between computed prediction rating and actual rating for allevaluated users and all items in their testing set.

MAE =

∑u∈G

∑i∈testset |pred(u, i)− ru,i|∑

u∈G |testsetu|(3.1)

3.2.2 Python time module

To measure the running time of certain algorithms and certain pieces of thecode, the python ”time” module 1 has been used.

1https://docs.python.org/2/library/time.html

https://docs.python.org/2/library/time.html

Chapter 4

Implementation

This chapter describes how the system was implemented. For now, it shouldonly work locally, and since the project focuses more on the functionality andalgorithms, the GUI has been given less priority.

4.1 Requirements

4.1.1 Scenario:

A group of people wants to watch a movie. They have different taste, so theywonder what movie they as a group would enjoy the most. A person might alsohave seen one type of genre many times that week, which will affect the ratingof those types of movies at that time. For example if a person just had a horrormovie marathon the day before, then he probably wants to watch somethingelse that day.

4.1.2 Requirements:

• Recommendation: The system should recommend movies based on userratings history.

• Context: The system should take into account mood and whether or notthey are tired of a certain type of genre.

4.2 Data set

The data sets this project is using can be downloaded from http://grouplens.

org/datasets/movielens/100k/. One data set contains the rating, but doesnot include the movie name. Another data set contains movie name, the cor-responding movie id and its genres. We have used them both to connect eachrating to the correct movie before adding them to the database.

20

http://grouplens.org/datasets/movielens/100k/

http://grouplens.org/datasets/movielens/100k/

4.3. IMPLEMENTATION APPROACH 21

4.3 Implementation approach

The way the implementation has been done consists of four stages:

• Set up

• Defining context

• Find recommendation for each group member

• Aggregate recommendations into one recommendation list

4.3.1 Set up

Before the web application can do any recommendation, it needs some data(rating history). When entering the ”getData” page, the code for uploading thedata from the text file into the database runs if there are not any data in thedatabase. The different genres have been manually added with ”django admin”interface, before the code connects each movie to its genres.

Figure 4.1: RE model of the database

Figure 4.2: Django admin interface

4.3.2 Defining context

The context parameters in this project are genres that the user is tired of atthat moment. Note that this differs from user preference, because the user mightprefer the genre, but might want to see something else at that moment because

22 CHAPTER 4. IMPLEMENTATION

he for instance saw it too many times last week. Other parameters that canhave been relevant is length of the movie and whether or not a user has enoughtime to watch it, however, the data set that is being used in this project doesnot contain information about the movie length.

Figure 4.3: Screenshot of how a user can add genres to his ”tired of genre” list.

On the page shown above the user can add certain genres that a user is tiredof, by choosing a genre from the drop down menu and writing the username ofthe relevant user. The user will then be added to a ”usermisery” dictionary1 ,if the user is not already in the dictionary. If the user is already a key in thedictionary then the genre is added to the user’s list.

4.3.3 Individual recommendation

Before any calculation can be made, the system needs to access the database tothe get data. However, accessing the database require a great amount of time,and therefore it was decided to acquire all the data at once, and place it in to apython ”dictionary”. The next time I need data, the system only need to lookup in the dictionary, which takes constant time (O(1)) for every look up, whichis much faster than finding the value in the database. Every time a user ratesa movie, this will then be added to the database and the dictionary. Then thesystem does not needs to re-upload the data from the database for every rating.This means that the system only need to fetch the data from the database onceevery log in. The dictionary is a nested dictionary where ”userId” is the key inthe first level and ”movieName” is the key in the second level.

1https://docs.python.org/2/tutorial/datastructures.html

https://docs.python.org/2/tutorial/datastructures.html


Figure 4.4: Dictionary of the ratings

In this project, the use user-based collaborative filtering approach has beenused to find recommendation for each group member. To calculate the simi-larity between two users, the Pearson correlation (formula 2.2) has been used.A sim(userU,userV) function was implemented to calculate the similarity, it re-turns a list of length three, where the first element is the similarity value, secondelement is the ”userU” average rating and the third is ”userV” average rating.To make the code simpler, the Pearson correlation has been rewritten as:

simUu,v = corru,v =

∑i∈Iu,v

(ru,irv,i)−∑

i∈Iu(ru,i)

∑i∈Iv

(rv,i)

a√(∑i∈Iu r2u,i −

(∑

i∈Iu(ru,i))

2

a

)(∑i∈Iv r

2v,i −

(∑

i∈Iv(rv,i))

2

a

)(4.1)

Where a = number of items in i∈ Iu,v

Although it looks more complicated, it was easier to implement because onecould iterate through the ”userU” and ”userV” ratings once, then store the sumof the ratings in a variable and use those variable in the calculations, instead oflooping through each user ratings to find the average, then loop through againand for each rating the system has to calculate rating minus the average rating.As one can see in the code below, instead of having each rating as variable, thewhole sum is a variable. The proof can be found in Appendix B.

de f sim ( userU , userV ) :


bothRatings = {}f o r movie in d i c t i o n a r y [ userU ] :

i f movie in d i c t i o n a r y [ userV ] :bothRatings [ movie ] = 1

amountOfRatings = len ( bothRatings )i f amountOfRatings == 0 :

re turn [ 0 , 0 , 0 ]

sumOfRatingsU = sum ( [ d i c t i o n a r y [ userU ] [ movie ] f o r movie in bothRatings ] )

sumOfRatingsV = sum ( [ d i c t i o n a r y [ userV ] [ movie ] f o r movie in bothRatings ] )

sumOfUserURatingSqrt = sum ( [ pow( d i c t i o n a r y [ userU ] [ movie ] , 2 ) f o r movie in bothRatings ] )sumOfUserVRatingSqrt = sum ( [ pow( d i c t i o n a r y [ userV ] [ movie ] , 2 ) f o r movie in bothRatings ] )

productOfUsers = sum ( [ d i c t i o n a r y [ userU ] [ movie ] ∗ d i c t i o n a r y [ userV ] [ movie ] f o r movie in bothRatings ] )

t e l l e r = productOfUsers − ( sumOfRatingsU∗sumOfRatingsV/amountOfRatings )nevner = math . s q r t ( ( sumOfUserURatingSqrt − pow( sumOfRatingsU , 2 ) / amountOfRatings ) ∗ ( sumOfUserVRatingSqrt − pow( sumOfRatingsV , 2 ) / amountOfRatings ) )i f nevner == 0 :

re turn [ 0 , sumOfRatingsU/amountOfRatings , sumOfRatingsV/amountOfRatings ]e l s e :

r e s u l t = t e l l e r / nevnerre turn [ r e s u l t , sumOfRatingsU/amountOfRatings , sumOfRatingsV/amountOfRatings ]

We have to do this calculation for every user that has rated a movie (everyuserId in the dictionary). We also only compare users that have seen at leastone common movie, since there is no point in using a user who do not have asingle movie in common to calculate the prediction. Instead of choosing the Kmost similar users, we have defined a minimum threshold.

After calculating the similarities, the web application calculates the predic-tion for each movie (1682 in total), before picking the one with highest predictedrating for each user in the group. Before calculating the prediction of a certainmovie, the application checks whether the movie contains a genre that a useris tired of at that moment (is a value for the key that represents the user inusermisery dictionary). If it does, then it skips the movie and goes to the nextmovie.

4.3.4 Group recommendation

When each user-preferred movie is calculated, another nested dictionary is cre-ated, where ”userId” is the outer key, the preferred movies are the inner keyand the ratings are the value. The dictionary holds information about whatrating each user in the group rated his own preferred movie and other user’s


preferred movie. This information is used to find the movie that will benefit thegroup the most. To do that, average aggregation (formula 2.10) has been used.Finding the movie that will give highest average score is not enough. How muchthe group disagree on a certain movie rating has to be taken into account. Inthis project, the average pairwise disagreement (formula 2.12) has been used.Finally, the consensus function can be calculated using the formula 2.14, andthen the movie with the highest consensus value is recommended.

4.3.5 Short summary

Find similar users:

Use them to predict each movie:

Pick the movie with highest predicted rating:

Repeat for every user in the group

Then aggregate to find the best movie for the group


4.4 Technology

Python Django 2 has been used to implement the prototype. Django is a opensource web application framework. It supports four database back-ends: Post-greSQL, MySQL, SQLite and Oracle. Django makes it easier to create database-driven websites.

2https://www.djangoproject.com

https://www.djangoproject.com

Chapter 5

Evaluation

5.1 Performance

This section is going to go through the speed performance of the implementation.

5.1.1 Data inserting

The time needed to insert 100 000 ratings and 1682 movies into the databaseis approximately 600 seconds, or 10 minutes. However, this is not problem forthe users, because this is only done once, and before the users can try the webapplication. It is something to consider for the developer. It can get muchlonger if a much bigger data set is being used.

Figure 5.1: Number of seconds to insert the data

5.1.2 Recommendation

The recommendation has two ”states”: one where the data is not in the memory,and one where it is. When the data is not in the memory, then it needs to accessthe database to get it, which takes a lot of time, since the database contains 100000 ratings, and each rating has to be put in to the dictionary. In fact, aftertesting the speed of that part of the implementation, uploading the data to thedictionary takes more time than the calculation part of the recommendation.

27

28 CHAPTER 5. EVALUATION

The running time when the data is not in the dictionary is approximately 8.4seconds.

Figure 5.2: Running time, without data in memory

After the first recommendation however, the data is stored in the memoryand the next recommendations will only take approximately 0.4 seconds.

Figure 5.3: Running time, with data in memory

5.2 Accuracy

This section covers how accurate the recommendation was. We are going tocalculate the accuracy using the MAE as mentioned in chapter 3.2.

5.2.1 Individual recommendations

Two people were asked to rate some movies. Person A rated 14 movies, whilePerson B rated 6 movies.

Table 5.1: Person A and Person B ratings. X means not ratedMovie: Person A ratings: Person B Ratings

Lion King, The (1994) 4 5Usual Suspects, The (1995) 3 X

Titanic (1997) 3 4Die Hard (1988) 4 X

Seven (Se7en) (1995) 4 XDirty Dancing (1987) 2 XMen in Black (19973) 3 X

Home Alone (1990 3 4Alien (1979) 4 X

GoldenEye (1995) 4 XDemolition Man (1993) 3 X

William Shakespeare’s Romeo and Juliet (1996) 3 XToy Story (1995) 5 4

Delicatessen (1991) 1 XHunchback of Notre Dame, The (1996) X 5

Aladdin (1992) X 4

5.2. ACCURACY 29

Since the system recommended movies that both Person A and B have notseen, it is not possible to calculate the MAE based on the recommendation itself(since they cannot be forced to watch the movie to give it an actual rating).However, we will instead use prediction of movies they actually have seen (andhas not rated), and calculate MAE based on that.

Below is a table of the predicted rating for a certain movie for each person,and the way in which the person actually rated the movie.

Table 5.2: Person A and Person B ratings. Similarity threshold:0.0Movie: Person A ratings: A predicted rating Person B Ratings B predicted rating

Pocahontas 3 2.4 5 3.5Forest Gump 4 3.6 5 4.7

Grease 3 2.9 4 4.0



Grease 3 2.9 4 3.9



Grease 3 2.9 4 3.8

As one can see there is not much different between choosing similarity thresh-old = 0.0 or threshold = 0.2. And since the users only rated the movies withintegers, we have to round the predicted ratings to the closest integer as well,because when the user gives a certain rating, for instance 4, it could be anythingbetween 3.5 to 4.4.

Rounding all numbers to integers, using table 5.2 or 5.3 and the formula insection 3.2 gives us:

MAE =|3− 2|+ |4− 4|+ |3− 3|+ |5− 4|+ |5− 5|+ |4− 4|

6=

1

3(5.1)

While table 5.4 gives us:

MAE =|3− 3|+ |4− 4|+ |3− 3|+ |5− 3|+ |5− 5|+ |4− 4|

6=

1

3(5.2)

Which means that all three thresholds gives us the same MAE. However,when using the threshold 0.5, the error distribution among the users changed.

30 CHAPTER 5. EVALUATION

With similarity threshold 0.0 or 0.2, both person A and person B had a moviepredicted wrong with 1 value. While, with similarity threshold 0.5, all moviesfor person A, got the right prediction, while person B, who rated 8 less moviesthan A, had an error of value 2 (5-3).

5.2.2 Context

The system removes movies with unwanted genres, to what extent this helpsthe recommendation will be discussed further in chapter 6.

5.2.3 Group recommendation

For person A and B the system was only able to recommend when the thresholdwas low. When the threshold was 0.2 or higher it failed to recommend a movieto the group. This will be discussed further in chapter 6.

Chapter 6

Discussion

Evaluating recommender systems and their algorithms is not easy . Differentalgorithms may be better or worse on different data sets. Some algorithmsworks good when the data set contains many more users than items. However,it might not be appropriate in a domain where there are many more items thanusers. There also exist other properties for data sets, like ratings density, ratingscale, and more [3]. This section will discuss the system, the results and theirvalidity.

6.1 Data set

The data set that has been used is old. It only contains movies from before year2000. This is a problem because the system will only recommend old movies.It is also a problem for the users, because it is not easy to rate a movie thathas not been seen in 20 years. When Person A and B were asked to rate somemovies from this data set, they had trouble finding movie that they had seen,and when they did find a movie, they had trouble rating it, unless the moviewas a clear ”5” (rating) or clear ”1”. This is also the reason why the numberof test set was so low.

6.2 Individual recommendation

The prediction that was made by the system is not bad at all. However, we onlytested for 6 ratings (three movies for two users), which is not a high number oftests. Increasing the similarity threshold increased the accuracy for Person A,but decreased the accuracy for Person B. Whether this is due to the low numberof ratings from Person B or just a coincidence is hard to tell, because of the lowamount number of items they were tested for. Increasing the number of moviesto test could have increased the error for Person A, and thus balanced out thedifference. However, based on this evaluation, increasing the number of ratings,does increase the accuracy of the predictions.

31

32 CHAPTER 6. DISCUSSION

6.3 Context

When a user adds a genre that he is tired of, then all movies that contain thatgenre get removed from the recommendation completely. This is not alwaysappropriate. Sometimes, a user might still want to watch a movie despite con-taining the genre he is tired of, because it is simply that good. However, findinga good way to weigh it is not easy. An alternative way of doing it, is to allowthe user to set a minimum threshold, where only movies with genres the user istired of with lower predicted rating than this threshold is discarded.

6.4 Group recommendation

As mentioned in section 5.2.3, the system failed to give a group recommenda-tion to Person A and B when the threshold were set to 0.2 or higher. Thisproblem appears when there are not any similar users (similarity value belowthe threshold) to predict a certain movie, that is recommended to another userin the group. For instance, if ”Batman Begins” is recommended to user A (themovie with highest prediction), but there are not any similar user to B thathas seen that movie, then the algorithm fails to calculate the predicted ratingfor that movie for user B. This is a problem because without the rating theaverage aggregation nor the group disagreement function can be calculated (orany other aggregation function), hence the consensus function can not be cal-culated either. Instead of using aggregation function on only movies that hasbeen recommended to the different users, one could use the aggregation func-tion on the set of all movies that has been given a prediction rating. However,this method might discard movies that everyone likes, but because it fails torecommend to one person, it does not get recommended at all. However, thisproblem already lies in collaborative filtering in general, if the data set is toosparse. It is similar to the problem when a new movie is added. Since no onehas seen it, no one will rate it, and therefore it will never be recommended. Itis also possible to set the predicted rating to one (the lowest rating value), if nopossible prediction can be made. Although if the value is probably lower thanthe actual value, it is still better than discarding the movie totally. Also, thevalues for w1 and w2 in formula 2.14 has been arbitrary chosen as w1=0.75 andw2=0.25, without much research to back it up. The idea behind the consensusfunction is to give a movie with high disagreement, a lower score than a moviewhere everyone agrees. The challenge is to find good values for w1 and w2.

Chapter 7

Conclusion and future work

In this project, a first prototype has been developed to test its ability to predictratings. User based collaborative filtering with Pearson correlation has beenused to predict individual ratings. Then a consensus function has been calcu-lated to rate a movie to a certain group. The results show that the individualprediction is pretty good, but it has to be taken with a pinch of salt, becauseof the low number of tests. The results also show the weakness of collaborativefiltering. Since there were not any similar users to predict a certain movie for acertain user, the system fails to recommend a movie to the group.

From here, there are many ways to go. A better (newer) data set has to beutilized. This will make it easier for the user to rate and test the prototype. Inaddition a better GUI has to be implemented. Then, one can either continueimproving the algorithm that has been used here, this includes finding a solutionto the problem with no similar users to predict a rating,a good value for thethreshold, and good values for w1 w2, or try a totally different algorithm. Itcould be interesting to compare implemented algorithm to another algorithm,for instance a model-based one. An model-based method can solve the sparsityproblem and can have higher scalability . Try to add more context, for instancemovie time length, which can also be interesting.

33

Appendices

34

Appendix A

Code

The code has been sent along this report. However, one can test the webapplication at http://1989398c.ngrok.io/login/. If the local (test) server is down,send a mail to [email protected].

35

Appendix B

Proof for rewritten Pearsoncorrelation

simUu,v = corru,v =

∑i∈Iu,v

(ru,i − ru)(rv,i − rv)√∑i∈Iu,v

(ru,i − ru)2√∑


⇐⇒

corru,v =

∑i∈Iu,v

(ru,irv,i − rurv,i − rvru,i + rvru)√∑i∈Iu,v

(ru,i − ru)2√∑


⇐⇒

corru,v =

∑i∈Iu,v

(ru,irv,i − rurv,i − rvru,i + rvru)√∑i∈Iu,v

(r2u,i − 2ruru,i + r2u)√∑

i∈Iu,v(r2v,i − 2rvrv,i + r2v)

(B.1)

Note that

ru =

∑i∈Iu,v

ru,i

arv =

∑i∈Iu,v

rv,i

a(B.2)

where a=number of items. Which gives us:

simUu,v = corru,v =

∑i∈Iu,v

(ru,irv,i)−∑

i∈Iu(ru,i)

∑i∈Iv

(rv,i)

a√(∑

i∈Iu r2u,i −(∑

i∈Iu(ru,i))

2

a )(∑

i∈Iv r2v,i −

(∑

i∈Iv(rv,i))

2

a )

(B.3)

36

Glossary

GUI Graphical user interface. 18, 20, 33

MAE Mean absolute error. 18, 28, 29

37

Bibliography

[1] Gediminas Adomavicius and Alexander Tuzhilin. Toward the next gener-ation of recommender systems: A survey of the state-of-the-art and possi-ble extensions. Knowledge and Data Engineering, IEEE Transactions on,17(6):734–749, 2005.

[2] Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawlat, Gautam Das, andCong Yu. Group recommendation: Semantics and efficiency. Proceedings ofthe VLDB Endowment, 2(1):754–765, 2009.

[3] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John TRiedl. Evaluating collaborative filtering recommender systems. ACM Trans-actions on Information Systems (TOIS), 22(1):5–53, 2004.

[4] Alan R Hevner. Design science in information systems research. MIS Quar-terly, 28(1):75–105, 2004.

[5] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and GerhardFriedrich. Recommender systems: an introduction. Cambridge UniversityPress, 2010.

[6] Prem Melville and Vikas Sindhwani. Recommender systems. In Encyclopediaof machine learning, pages 829–838. Springer, 2010.

[7] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to recom-mender systems handbook. pages 73–94. Springer, 2011.

[8] Alan Said, Shlomo Berkovsky, and Ernesto W De Luca. Putting things incontext: Challenge on context-aware movie recommendation. In Proceed-ings of the Workshop on Context-Aware Movie Recommendation, pages 2–6.ACM, 2010.

[9] Shari Trewin. Knowledge-based recommender systems. Encyclopedia of Li-brary and Information Science: Volume 69-Supplement 32, page 180, 2000.

38

a context aware group recommendation ... - folk.idi.ntnu.no filetdt4501 computer science,...

Documents