recommender system for collaborative communities€¦ · recommender systems are extensively used...

29
I NDIAN I NSTITUTE OF T ECHNOLOGY B OMBAY E KLAVYA S UMMER I NTERNSHIP 2018 C OLLABORATIVE C OMMUNITIES Recommender System for Collaborative Communities Authors: Pranav Vyas Aruna Vasam Ajay Damera Harika Thatipelli Guide: Prof. Deepak B. Phatak Project-in-Charge: Nagesh Karmali Mentor: Urmi Saha July 3, 2018

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

INDIAN INSTITUTE OF TECHNOLOGYBOMBAY

EKLAVYA SUMMER INTERNSHIP 2018

COLLABORATIVE COMMUNITIES

Recommender System forCollaborative Communities

Authors:Pranav VyasAruna VasamAjay DameraHarika Thatipelli

Guide:Prof. Deepak B. Phatak

Project-in-Charge:Nagesh Karmali

Mentor:Urmi Saha

July 3, 2018

Page 2: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Summer Internship CertificateDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY BOMBAY

Certificate

The project entitled “Recommender System for Collaborative Commu-nities” submitted by Mr. Pranav Vyas, Miss. Aruna Vasam, Mr. AjayDamera, Miss. Harika Thatipelli for the 2018 Summer Internship from16th May 2018 to 6th July 2018, was satisfactorily done and submittedat the Department of Computer Science and Engineering, IIT Bombay.

Prof. Deepak B. PhatakDept of CSE, IITBGuide

Mr. Nagesh KarmaliDept of CSE, IITBProject-in-Charge

Place: IIT BombayDate: July 3, 2018

Page 3: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Acknowledgements

We would like to express our profound gratitude towards our guideProf. Deepak B. Phatak, Project In-charge Mr. Nagesh Karmali, ProjectManagement Officer Ms. Firuza Aibara and Mr. Abhijit Bonik for theirexemplary guidance, valuable information and constant encouragementthroughout the project.

We would also like to extend our thanks to our mentor Ms. UrmiSaha, for her cordial support, cooperation and guidance.

2

Page 4: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Declaration

We declare that this written submission represents our ideas in ourown words and where others ideas or words have been included, wehave adequately cited and referenced the original sources. We also de-clare that we have adhered to all principles of academic honesty andintegrity and have not misrepresented or fabricated or falsified any idea/ data / fact / source in my submission. We understand that any vio-lation of the above will be cause for disciplinary action by the Instituteand can also evoke penal action from the sources which have thus notbeen properly cited or from whom proper permission has not been takenwhen needed.

Mr. Pranav Vyas

Miss. Aruna VasamRGUKT-Basar

Mr. Ajay DameraRGUKT-Basar

Miss. Harika ThatipelliRGUKT-Basar

3

Page 5: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Abstract

In recent years the amount of content on the Web has increasedrapidly. This has made it harder for users to find content that matchestheir interests. This has lead to the development of recommender sys-tems which try to help users make an informed choice about what con-tent to consume. Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender system for articles published on CollaborativeCommunities a project by IIT Bombay. In this paper we will discuss thedesign, implementation, testing and evaluation of a WALS recommenderand a TimeSVD recommender.

4

Page 6: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Contents

1 Introduction 71.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 71.2 Method and Approach . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Neighbourhood Based Methods . . . . . . . . . . 81.2.2 Latent Factor Models . . . . . . . . . . . . . . . . 8

2 Requirements 102.1 Functional Requirements . . . . . . . . . . . . . . . . . . 102.2 Non Functional Requirements . . . . . . . . . . . . . . . 10

3 Design 113.1 System Overview . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 System Architecture . . . . . . . . . . . . . . . . . 113.1.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.3 Languages . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.1 Installation in a Virtual Environment . . . . . . . . 143.2.2 Installation using Docker . . . . . . . . . . . . . . 15

3.3 Method and its Importance . . . . . . . . . . . . . . . . . 163.3.1 Matrix Factorization . . . . . . . . . . . . . . . . . 163.3.2 (Weighted) Alternating Least Squares . . . . . . . 173.3.3 Temporal Filtering . . . . . . . . . . . . . . . . . . 17

3.4 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4.1 For Getting Predictions . . . . . . . . . . . . . . . 183.4.2 For Training the Model . . . . . . . . . . . . . . . 19

3.5 Front End Design . . . . . . . . . . . . . . . . . . . . . . 21

4 Experiments 23

5 Conclusion and Future Work 25

5

Page 7: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

6 Selenium Test Cases and Test Results for OER 26

List of Figures

1 Training the model . . . . . . . . . . . . . . . . . . . . . . 112 Fetching recommendations . . . . . . . . . . . . . . . . . 123 Recommendations for anonymous users . . . . . . . . . . 224 Recommendations for authenticated users . . . . . . . . 235 Comparision of WALS and TimeSVD++ . . . . . . . . . . 24

6

Page 8: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

1 Introduction

On most websites, users are spoiled for choice. There are more movieson Netflix and songs on iTunes than any body could realistically watch orhear in their lifetime. On Collaborative Communities, if all goes well, weexpect there to many more articles than users can read. So naturally,the users will have to triage articles that they think they will enjoy.

1.1 Problem Definition

Our recommender system has been designed to help Collaborative Com-munity users make an informed decision about which articles they wantto read. Users can browse our recommendations, and depending ontheir preferences and the title of the article, they may accept our recom-mendation or reject it. If they accept it, then they may up vote or downvote the article, they may even decide to share the article on social me-dia. A well designed system should log these interactions and use themto generate better recommendations [10].

1.2 Method and Approach

In real life, we often ask our friends and acquaintances for their inputbefore making decisions. Eg: We might ask our peers for book rec-ommendations or check a review written by our favourite critic beforedeciding which movie to watch. A very natural approach therefore, isto try and generate recommendations for a user based on items likedby users with similar tastes. We call this approach to filtering items theCollaborative Filtering approach [10].This, however, just begs the question: how does one decide which usershave similar tastes? There are two major approaches to this one is

7

Page 9: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

based on neighbourhoods of items or users and another is based on ex-plaining ratings by projecting users and items onto a latent factor space.We shall discuss both approaches.

1.2.1 Neighbourhood Based Methods

Neighbourhood based models try to cluster similar users or movies intothe same neighbourhood. When clustering on items, the neighbourhoodbased approach tries to predict a user’s rating for an item by look atother items in its neighbourhood. For example, the neighbourhood of amovie like Saving Private Ryan might contain other war movies, moviesdirected by Steven Spielberg, or movies starring Tom Hanks. By lookingat how the user has rated these movies, we can build up a very goodguess for how the user would rate Saving Private Ryan [7].Neighbourhood based models that cluster on users behave similarly.When trying to generate recommendations for a particular user, theylook at which items were rated highly by users in the same cluster.While this approach has intuitive appeal, sophisticated neighbourhoodbased models that rely on support vector machines or neural networksmay be too high dimensional to be computationally tractable. On theother hand, naive neighbourhood models, like kNN, while being compu-tationally efficient suffer from poor accuracy [10].

1.2.2 Latent Factor Models

Latent factor models take a slightly different approach to the same prob-lem by trying to explain ratings by characterising users and items on anarbitrary (though usually small eg: 10) number of factors inferred fromthe pattern of ratings by users.The reader may be familiar with Pandora’s Music Genome Project. Forthose who are not, musicologists employed by Pandora listen to songsand score them based on over 450 different attributes [9, 7]. The factor

8

Page 10: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

inferred by a latent factor model can be thought of as an alternativeto such extensive human categorization. This makes the latent factormodel preferable for a system like Collaborative Communities. Sincewe impose no restrictions on the type of articles submitted, save thatthey be educational in nature, comprehensively scoring articles on theirattributes would be a Herculean task.The computer learned factors for items may sometimes be human in-terpretable such as genre of a movie but they may also be based oncompletely uninterpretable dimensions [7]. In fact, interpreting the re-sults of a latent factors model is a question of some research interest[5].For users, each factor measures how much the user likes movies thatscore high on the corresponding movie factor [7].

9

Page 11: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

2 Requirements

2.1 Functional Requirements

1. Recommendations should be displayed next to every article viewedby the user.

2. For anonymous users, recommendations will be based on IP Ad-dresses

3. For authenticated user, recommendations will be based on userids.

4. The model will incorporate the number of times the article wasshared, the number of up votes and down votes, the the numberof views and the time of these events.

5. The model must be able to predict even while being trained. Thatis, training must be non blocking.

2.2 Non Functional Requirements

1. Predictions should be generated quickly.

2. The model should be scalable, i.e., it should be able to handlemany articles and users with many ratings between them.

3. We should respect users’ privacy. While more information aboutusers’ will help the model generate more accurate recommenda-tions. If we are too intrusive, users may feel uncomfortable likethey are being watched.

4. The recommendation system must be easily extensible.

10

Page 12: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Figure 1: Process for training the system.

3 Design

3.1 System Overview

Our Recommender System runs separately from the main CollaborativeCommunities system. All interaction between the main system and therecommender system happens through GET and POST requests.

3.1.1 System Architecture

Before our system can generate any recommendations, it must betrained. To train the model, a POST request needs to be made to theserver. See the API section for details. After training, the trained modelis then stored in Redis. Since Redis is in memory, retrieval is guaran-teed to be fast. Also since it automatically backs up to disk, this savesus the extra work of having to do it ourselves. Training should be runfrequently enough to keep the model current.To fetch recommendations, make a GET request to the server. The ex-

act syntax is covered in the API section. After receiving a GET request,the trained model must be loaded from Redis and then used. Sincethe recommendations returned will be the same until and unless themodel is retrained and since prediction doesn’t alter the model in any

11

Page 13: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Figure 2: Process for getting recommendations from the system.

way. Hence, the collaborative communities system could even cachethe GET request and use it for a reasonable period of time without fear.

3.1.2 Tools

1. Django The main Collaborative Communities System is written inDjango. See the Collaborative Communities manual for details.

2. Flask The recommendation API is written in Flask. We did not useDjango becuse the API itself is relatively simple so we don’t needall the features offered by Django.

3. Matplotlib This is only used to visualise the results of the model. Itis not necessary for the functioning of our system.

4. NumPy Our code heavily uses NumPy. All our dense matrices areNumPy matrices and we also rely on NumPy for the majority ofour mathematical functions.

5. Pandas Pandas is used for reading and creating CSV and JSONfiles.

12

Page 14: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

6. Redis We use Redis as an in memory store for the output of ourmodel. Since it is in memory, this allows low latency storage andRedis can be configured to automatically back up to disk.

7. Scikit-Learn We use scikit-learn to split the data we get from theEvent Logging sub system into a training and a testing set.

8. SciPy We rely on SciPy sparse matrices for our inputs as we ex-pect rating data to be relatively sparse.

9. Sh This is optional and only needed when training on GoogleCloud. It is not necessary for the functioning of our system.

10. TensorFlow We use one of the matrix factorization models builtinto TensorFlow for recommendation. This allows us to achieveeffortless parallelism and GPU acceleration.

3.1.3 Languages

The majority of our system is written in Python. Along with some minorJavaScript for the front end.

3.2 Installation

This system has been tested on Ubuntu 16.04 LTS. There are two in-stallation methods supported right now. One is via virtual environmentswhich is recommended for day to day development and one is usingdocker, which is recommended for final deployment. Both of these meth-ods require that Collaborative Communities and the Event Logging sub-system be installed before hand. Please consult the Event Logging andCollaborative Community manuals for details.

13

Page 15: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

3.2.1 Installation in a Virtual Environment

1. Install Redis

$ sudo apt−get i n s t a l l red is−server

2. Install Python 3

$ sudo apt−get i n s t a l l python3−p ip python3−devpython−v i r t u a l e n v

3. Create a virtual environment. You may name it anything you like,this guide assumes it is named rec api.

$ v i r t u a l e n v −−system−s i t e−packages −p python3r ec a p i

4. Activate the virtual environment.

$ source r ec a p i / b in / a c t i v a t e

5. Clone the Community-Recommendation repository.

$ g i t c lone h t t ps : / / g i thub . com/ f researchgroup /Community−Recommendation . g i t

6. Change into the Community−Recommendation directory

$ cd Community−Recommendation

7. Install the dependencies.

$ pip3 i n s t a l l −r requirements . t x t

8. Generate the authentication token (consult the Event Logging guidefor a how-to) and export it.

14

Page 16: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

$ export LOG AUTH TOKEN=Your Token Here

9. Set the environment variables for Flask.

$ export FLASK APP= f l a s k a p i . py

and optionally,

$ export FLASK ENV=development

10. Finally start the Flask app. You can start on any port, this guideassumes you are running on 3445. To integrate with collaborativecommunities, the host needs to be set to 0.0.0.0.with collaborativecommunities, the host needs to be set to 0.0.0.0

$ f l a s k run −−host 0 . 0 . 0 . 0 −−po r t 3445

3.2.2 Installation using Docker

1. Install Docker and Docker-Compose

2. Clone the Community-Recommendation Repository

$ g i t c lone h t t ps : / / g i thub . com/ f researchgroup /Community−Recommendation . g i t

3. Change into the Community-Recommendation directory

$ cd Community−Recommendation

4. Generate the authentication token (consult the Event Logging guidefor a how-to) and add it to the .env file.

$ echo ”Your Token Here ” >> . env

5. Build the system using Docker-Compose.

15

Page 17: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

$ sudo docker−compose b u i l d

6. Run the system using Docker-Compose.

$ sudo docker−compose up

3.3 Method and its Importance

We already covered why we are using latent factor models in the intro-duction. Over here we aim to give a more detailed explanation abouthow our model was chosen and why. We won’t touch on the math-ematical theory behind the models, the reader is advised to read thereferences for that.

3.3.1 Matrix Factorization

In October 2006, Netflix released a dataset containing 100 million anony-mous movie ratings, and challenged computer scientists around theworld to develop a recommender system that could beat its own inhouse algorithm called Cinematch [1]. The model that ended up win-ning, relied on multiple probabilistic matrix factorization models [7].In a PMF model, we take as input a training matrix M which is m users×n items. The entries of this matrix represent ratings.1 We then fac-torize this matrix into two matrices, U which is m × d and V which isd× n. Where d is a hyper parameter which indicates how many factorsare to be learned. A higher value of d increases accuracy at the cost ofcomputational complexity.Give U and V we predict a rating r̂ij as uTi vj, where ui is the ith row ofU and vj is the jth column of V . For the theory behind this factorization,see [7, 11].

1In our case, Mij represents whether or not user i has viewed article j. This is because of certaintechnical issues we encountered see System Overview and Future Work for more details.

16

Page 18: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

3.3.2 (Weighted) Alternating Least Squares

One approach to learning the above factorization is to simply performgradient descent to minimize

∑ij(rij − uTi vj)

2. This approach was oneof the earliest to be used on the Netflix dataset and it gave decentresults.[2]Another approach is to first fix one of U and V and solve for the other.This method is known as Alternating Least Squares or ALS. Gradientdescent is easier to implement and enjoys faster convergence than ALS,however ALS is much more easily parallelizable [5, 12].In weighted alternating least squares, we use a weight vector to nor-malize row and/or column frequencies. Such normalization is helpful insystems dealing with implicit feedback such as ours [4]. However, wehave not implemented this normalization yet. See Future Work for moredetails.

3.3.3 Temporal Filtering

The naive matrix factorization model suffers from two major deficiencies.

1. It does not account for biases in ratings. Some users may beoverly critical and give a low ratings to almost every article. At thesame time, some articles will be better (worse) than the averagearticle and so will get a higher (lower) than average rating.

2. It does not account for change in user preferences over time. Forexample, during the World Cup we would expect to see a spike inthe number of people reading articles on football. These sorts ofinteractions cannot be captured by the simple WALS model.

To correct these deficiencies, we have implemented a temporal filteringmodel based on the TimeSVD++ algorithm given in [6]. This algortihmfirst adjusts every rating given by the user to account for the average

17

Page 19: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

rating of the user and of the article. This takes care of the static bias.To attack the temporal bias, we take the time period of our dataset anddivide it into equally sized ”bins”. This allows us to capture gradualchanges in bias. To soak up any sudden spikes in bias, for each distinctunit of time (a day for us) and user, we associate a different variable.This variable then captures any transient changes in bias that occurwithin that time unit.

3.4 API

The Recommender Systems API is highly configurable and supports thefollowing options:

3.4.1 For Getting Predictions

Format: ip :port /rec?user=userid&nrecs=n

• user This specifies the user id or IP address for which recommen-dations are needed

• nrecs This specifies the number of recommendations that are needed.It is an optional parameter which defaults to 5. It can also be con-figured via the environment variable REC NRECS

• model This specifies which model to use. Valid model names arewals and timesvd. Default is wals. The associated environmentvariable is DEFAULT MODEL

Eg: localhost:3445/rec?user=2&nrecs=10

18

Page 20: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

3.4.2 For Training the Model

The model is trained when a POST request is made to the API. Alloptions given below must be specified in a JSON dictionary sent with thePOST request. Alternatively, they may be set via environment variablesif they are specified.Sample POST request: curl −i −X POST −H ’Content−Type: application/json’ −d ’{”article−view”: ”http://localhost:8000/logapi/event/ article /view”}’ http :// locaost:3445/ trainThe following options are supported.

• article−view This must be a URL to the event logging API’s articleview logs. Please consult the Event Logging manual for details.

• k cores If users have viewed very few articles or if an article hasbeen viewed only a couple times, any recommendations made willlikely not be accurate. This setting controls the minimum numberof views an article must have and the minimum number of articlesa user must view before being considered. Default: 0 Environmentvariable: REC K CORES

• train size This specifies the size of the training dataset. Default:0.99 Environment variable: REC TRAIN SIZE

• debug If this is true, some debugging information will be printed tothe console. Default: False Environment variable: REC DEBUG

• niter This specifies the maximum number of training iterations.Default: 20 Environment variable: REC NITER

• ncomponents The number of factors that should be used. Default:10 Environment variable: REC NCOMPONENTS

• unobserved weight The weight to be given to unobserved items.Default: 0 Environment variable: REC UNOBSERVED WEIGHT

19

Page 21: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

• regularization The regularization term for WALS. High values re-duce over fitting but increase error. Default: 0.05 Environmentvariable: REC REGULARIZATION

• beta This is the decay term for TimeSVD++. Default: 0.015 Envi-ronment variable: REC BETA

• nbins Number of bins the time period for TimeSVD++ should besplit into. Default: 6 Environment variable: REC NBINS

• reg bias The regularization term for the bias vectors in TimeSVD++.Default: 0.01 Environment variable: REC REG BIAS

• reg item The regularization term for the item factor vectors in TimeSVD++.Default: 0.01 Environment variable: REC REG ITEM

• reg user The regularization term for the user factor vectors in TimeSVD++.Default: 0.01 Environment variable: REC REG USER

• learn rate The initial learning rate to use for TimeSVD++. Default:0.01 Environment variable: REC LEARN RATE

• max learn rate The maximum allowed value of the learning rate.Set to 0 for an unbounded learning rate. Default: 1000.0 Environ-ment variable: REC MAX LEARN RATE

• bold Should TimeSVD++ use the ”bold driver” heuristic? That is,take smaller step size when close to the optimum and larger whenaway. Default: False Environment variable: REC BOLD

• tol TimeSVD++ converges when the rate of change of error dropsbelow tol . Default: 1e-5 Environment variable: REC TOL

• view weight The weight that should be given to articles that haveviewed. Default: 1 Environment variable: REC VIEW WEIGHT

20

Page 22: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

• model The model to be used. Valid options are wals and timesvd.Default: wals Environment variable: DEFAULT MODEL

The following options are also supported, but beware, changing thesecan cause the API to stop working!

• format Should a csv or json file be used for intermediate process-ing. Changing this will break the code! Default: json Environ-ment variable: REC FORMAT

• dtype Which NumPy format to use for storing ratings. Default: np.float32 Environment variable: REC DTYPE

• save map Should the system remember external ids (i.e. the col-laborative communities ids) of users and items. You can set thisto false but then the output of the API cannot be interpreted :)Default: True Environment variable: REC SAVE MAP

• col order The names of the columns to be used in the internalrepresentation. If you change these around, the code should workbut there is little point in doing so. Default: [userID, articleID, rat-ings] Environment variable: REC COL USER, REC COL ITEM,REC COL RATING

• kwargs These are passed as is to the Pandas backend as is. De-fault: {}

3.5 Front End Design

Recommendations are displayed on a sidebar next to the article beingviewed by the user. We display recommendation to both authenticatedand anonymous users. Recommendations for anonymous users arebased on their IP Addresses and recommendations for authenticated

21

Page 23: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Figure 3: Article page for anonymous users. Recommendations aredisplayed in the sidebar.

users are based on their user ids. The user interface is the same forboth authenticated and anonymous users.

22

Page 24: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Figure 4: Article page for users who have logged in. Recommendationsare displayed in the sidebar.

4 Experiments

Since Collaborative Communities is still being developed, we didn’t haveaccess to actual article data. To benchmark our models, we had toresort to external datasets. We tested our models against the NetflixPrize Dataset[1], the Amazon reviews dataset (specifically the five coredigital music dataset) [8, 3] and a dataset consisting of book reviews[13]. Our scores do not match the literature best as we did not spendmuch time on hyper tuning. These tests are intended merely as proofsof concept. The chart below shows the results that we obtained. Wedo not plot the results of TimeSVD++ against the books dataset as thatdataset does not record the time of the rating.

23

Page 25: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

Figure 5: Comparision of WALS and TimeSVD++

24

Page 26: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

5 Conclusion and Future Work

In conclusion, our system accomplished most of the goals that we laidout at the start. We are confident that we have built a robust and scalablerecommender system that will be able to recommend articles of interestto Collaborative Communities users. However, there are some glaringexception some features could not be implemented either due to lack oftime, or technical issues, or some other reason. We list them below.

1. The system does not incorporate up votes, down votes, and shar-ing yet. When our project Reputation System was not yet in-tegrated with Collaborative Communities so up votes and downvotes were not available. The Event Logging system also did notlog article shares.

2. The TimeSVD++ model has not been written in TensorFlow. It iswritten in vanilla NumPy and SciPy. This makes it very slow forlarge datasets. Rewriting it in TensorFlow could result in hugeperformance gains.

3. We store some dictionaries in Redis as JSON dumps of Pythondictionaries. While this method is convenient, such a method ofstorage quickly becomes unwieldy. Instead of storing the dictio-nary directly, a more sophisticated approach would be to set thekeys in Redis with values from the dictionary. That is, insteadof SET mydict {”One” : 1} we should do SET mydict:One 1. Thissimple change could drastically reduce response time.

4. We have not implemented weighting for WALS, as we were underthe impression that most of our user data would be explicit. How-ever, since we only have views at this time, experimentation withdifferent types of weight vectors could prove fruitful.

25

Page 27: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

6 Selenium Test Cases and Test Results forOER

TestID

Action Expected Result Actual Result TestSta-tus

T01 click on createcommunity but-ton

success messageappears

success messageappears

pass

T02 click on createcommunity re-sources button

success messageappears

success messageappears

pass

T03 click on creategroup button

success messageappears

success messageappears

pass

T04 click on creategroup resourcesbutton

success messageappears

success messageappears

pass

T05 click on homebutton

home page home page pass

26

Page 28: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

References

[1] James Bennett, Stan Lanning, and Netflix Netflix. The netflix prize.In In KDD Cup and Workshop in conjunction with KDD, 2007.

[2] Simon Funk. Netflix Update: Try This at Home. http://sifter.

org/simon/journal/20061211.html, 2006. [Online; accessed 3-July-2018].

[3] Ruining He and Julian McAuley. Ups and downs: Modeling thevisual evolution of fashion trends with one-class collaborative filter-ing. In Proceedings of the 25th International Conference on WorldWide Web, WWW ’16, pages 507–517, Republic and Canton ofGeneva, Switzerland, 2016. International World Wide Web Confer-ences Steering Committee.

[4] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filteringfor implicit feedback datasets. In Proceedings of the 2008 EighthIEEE International Conference on Data Mining, ICDM ’08, pages263–272, Washington, DC, USA, 2008. IEEE Computer Society.

[5] Yehuda Koren. Factorization meets the neighborhood: A multi-faceted collaborative filtering model. In Proceedings of the 14thACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, KDD ’08, pages 426–434, New York, NY, USA,2008. ACM.

[6] Yehuda Koren. Collaborative filtering with temporal dynamics. InProceedings of the 15th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, KDD ’09, pages 447–456,New York, NY, USA, 2009. ACM.

[7] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factoriza-

27

Page 29: Recommender System for Collaborative Communities€¦ · Recommender systems are extensively used on e-commerce, video on demand, and music streaming platforms. We im-plement a recommender

tion techniques for recommender systems. Computer, 42(8):30–37, August 2009.

[8] Julian McAuley, Christopher Targett, Qinfeng Shi, and Antonvan den Hengel. Image-based recommendations on styles andsubstitutes. In Proceedings of the 38th International ACM SI-GIR Conference on Research and Development in Information Re-trieval, SIGIR ’15, pages 43–52, New York, NY, USA, 2015. ACM.

[9] Pandora. About the music genome project. https://www.

pandora.com/about/mgp, 2018. [Online; accessed 3-July-2018].

[10] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kan-tor. Recommender Systems Handbook. Springer-Verlag, Berlin,Heidelberg, 1st edition, 2010.

[11] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factor-ization. In Proceedings of the 20th International Conference onNeural Information Processing Systems, NIPS’07, pages 1257–1264, USA, 2007. Curran Associates Inc.

[12] Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and RongPan. Large-scale parallel collaborative filtering for the netflix prize.In Proceedings of the 4th International Conference on AlgorithmicAspects in Information and Management, AAIM ’08, pages 337–348, Berlin, Heidelberg, 2008. Springer-Verlag.

[13] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, andGeorg Lausen. Improving recommendation lists through topic di-versification. In Proceedings of the 14th International Conferenceon World Wide Web, WWW ’05, pages 22–32, New York, NY, USA,2005. ACM.

28