aws, hadoop and mahout – video game recommender ben gooding university of arkansas – department...

13
AWS, HADOOP AND MAHOUT – VIDEO GAME RECOMMENDER BEN GOODING UNIVERSITY OF ARKANSAS – DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRESENTED - APRIL 30, 2015

Upload: evangeline-bates

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

AWS, HADOOP AND MAHOUT – VIDEO

GAME RECOMMENDERBEN GOODING

UNIVERSITY OF ARKANSAS – DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PRESENTED - APRIL 30, 2015

MAHOUT

• Pronounced like Trout

• Open Source Machine Learning platform from Apache

• Used Mahout 0.9

RECOMMENDER TYPES

• Item-Item Based Recommenders• How similar items are to items

• User Based Recommenders• Based on the notion of some similarity between users

NEIGHBORHOODS

• Two types of Neighborhoods• N-Nearest Neighbor

• Nearest Neighbor Threshold

SIMILARITIES

• Euclidean Distance Similarity

• 1/(1+d) where d is the distance between two users

• Co-occurrence Similarity

• Explained by previous presentations

• Tanimoto Coefficient

• Ignores user preference numbers, only cares that a user has a preference

• Loglikelihood Similarity

• Based on # of items in common but is an expression of how unlikely two users are to have a similar interest

• Pearson Correlation Similarity

• # between -1 and 1. Measures tendency of two numbers when paired to move together

• High correlation the similarity is close to 1. Opposite, close to -1

THE DATASET

• 228,570 Users

• 21,025 Games

• 463,669 Reviews

• Dataset contained excess information.

• Stanford provided Python script to parse data, but not enough parsing.

• Modified Python script to parse out everything except User ID, Product ID, and Review Score

• Eliminated unknown user names

• Used G-Edit to remove some other excess information

• Wrote a C++ program to convert the User and Product IDs into numerical values

USER BASED NEAREST-N RECOMMENDER EVALUATIONSimilarity n=1 n=2 n=4 n=8 n=16 n=32 n=64 n=128

Euclidean NaN 0.205 0.284 0.361 0.498 0.542 0.604 0.646

Pearson NaN 0.799 0.868 0.886 0.878 0.904 0.960 0.989

Log-likelihood

NaN 0.526 0.771 0.769 0.766 0.808 0.784 0.718

Tanimoto NaN 0.723 0.955 0.826 0.792 0.807 0.822 0.755

USER BASED NEIGHBOR THRESHOLD RECOMMENDER EVAULATIONSimilarity t = 0.95 t = 0.9 t=0.85 t=0.8 t=0.75 t=0.7

Euclidean 0.503 0.503 0.503 0.503 0.503 0.504

Pearson 0.689 0.689 0.665 0.639 0.629 0.703

Log-likelihood

0.801 0.779 0.791 0.796 0.790 0.796

Tanimoto NaN NaN NaN NaN NaN NaN

ITEM BASED RECOMMENDER EVALUATION

Similarity Score

Euclidean 0.786

Pearson 0.944

Log-likehood 0.789

Tanimoto 0.783

HADOOP

• Distributed File System

• Difficult to setup without an easy to understand tutorial

• Got working on my virtual machine

• Couldn’t get Mahout to work with Hadoop as a single node cluster

• Java Class Not Found Exception

AMAZON WEB SERVICES

• Provides Elastic Map Reduce clusters

• Pre-installed with Mahout and Hadoop

• Used 1 Master Node and 3 Slaves

• Utilized the AWS Command Line Interface

AWS RECOMMENDER

• Took roughly 10-20 minutes to produce all of the recommendations.

• Used the item based recommender

• No distributed Generic User Based Recommender

• Generated recommendations for the users

• Utilized a Python based web server to display recommendations

• Input user id, spits out recommendations

FUTURE WORK

• Attempt to use Parallel ALS recommendations.

• Should provide more accurate results than the item based recommender

• Code available upon request, along with AWS Command Line commands