apache mahout tutorial - recommendation - 2013/2014

117
Apache Mahout – Tutorial (2014) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014 08/01/2014

Upload: cataldo-musto

Post on 08-Sep-2014

8.349 views

Category:

Technology


2 download

DESCRIPTION

SWAP Research Group - Mahout Tutorial on Recommendation. Based on Apache Mahout 0.8

TRANSCRIPT

Page 1: Apache Mahout Tutorial - Recommendation - 2013/2014

Apache Mahout – Tutorial (2014) Cataldo Musto, Ph.D.

Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale

Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014

08/01/2014

Page 2: Apache Mahout Tutorial - Recommendation - 2013/2014

Outline

• What is Mahout ?

– Overview

• How to use Mahout ?

– Hands-on session

Page 3: Apache Mahout Tutorial - Recommendation - 2013/2014

Part 1

What is Mahout?

Page 4: Apache Mahout Tutorial - Recommendation - 2013/2014

Goal

• Mahout is a Java library

– Implementing Machine Learning techniques

Page 5: Apache Mahout Tutorial - Recommendation - 2013/2014

Goal

• Mahout is a Java library

– Implementing Machine Learning techniques

• Clustering

• Classification

• Recommendation

• Frequent ItemSet

Page 6: Apache Mahout Tutorial - Recommendation - 2013/2014

What can we do?

• Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries to find

items users might like.

– Clustering - takes e.g. text documents and groups them into groups of topically related documents.

– Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.

– Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

Page 7: Apache Mahout Tutorial - Recommendation - 2013/2014

Why Mahout?

• Mahout is not the only ML framework

– Weka

– R (http://www.r-project.org/)

• Why do we prefer Mahout?

(http://www.cs.waikato.ac.nz/ml/weka/)

Page 8: Apache Mahout Tutorial - Recommendation - 2013/2014

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

Page 9: Apache Mahout Tutorial - Recommendation - 2013/2014

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

–Scalable

Page 10: Apache Mahout Tutorial - Recommendation - 2013/2014

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

–Scalable • Based on Hadoop (not mandatory!)

Page 11: Apache Mahout Tutorial - Recommendation - 2013/2014

Why do we need a scalable framework?

Big Data!

Page 12: Apache Mahout Tutorial - Recommendation - 2013/2014

Use Cases

Page 13: Apache Mahout Tutorial - Recommendation - 2013/2014

Use Cases

Recommendation Engine on Foursquare

Page 14: Apache Mahout Tutorial - Recommendation - 2013/2014

Use Cases

User Interest Modeling on Twitter

Page 15: Apache Mahout Tutorial - Recommendation - 2013/2014

Use Cases

Pattern Mining on Yahoo! (as anti-spam)

Page 16: Apache Mahout Tutorial - Recommendation - 2013/2014

Algorithms

• Recommendation

– User-based Collaborative Filtering

– Item-based Collaborative Filtering

– SlopeOne Recommenders

– Singular Value Decomposition-based CF

Page 17: Apache Mahout Tutorial - Recommendation - 2013/2014

Algorithms

• Clustering – Canopy

– K-Means

– Fuzzy K-Means

– Latent Dirichlet Allocation (LDA)

– MinHash Clustering

– Draft algorithms • Hierarchical Clustering

• (and much more…)

Page 18: Apache Mahout Tutorial - Recommendation - 2013/2014

Algorithms

• Classification – Logistic Regression – Bayes – Random Forests – Hidden Markov Models – Draft algorithms

• Support Vector Machines • Perceptrons • Neural Networks • Restricted Boltzmann Machines • (and much more…)

Page 19: Apache Mahout Tutorial - Recommendation - 2013/2014

Algorithms

• Other

– Evolutionary Algorithms

• Genetic Algorithms

– Dimensionality Reduction techniques

• Principal Component Analysis (PCA)

• Singular Value Decomposition (SVD)

– Frequent ItemSet Pattern Mining

– (and much more.)

Page 20: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout in the Apache Software Foundation

Page 21: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout in the Apache Software Foundation

Original Mahout Project

Page 22: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout in the Apache Software Foundation

Taste: collaborative filtering framework

Page 23: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout in the Apache Software Foundation

Lucene: information retrieval software library

Page 24: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout in the Apache Software Foundation

Hadoop: framework for distributed storage and programming based on MapReduce

Page 25: Apache Mahout Tutorial - Recommendation - 2013/2014

General Architecture

Three-tiers architecture (Application, Algorithms and Shared Libraries)

Page 26: Apache Mahout Tutorial - Recommendation - 2013/2014

General Architecture

Data Storage and Shared Libraries

Page 27: Apache Mahout Tutorial - Recommendation - 2013/2014

General Architecture

Business Logic

Page 28: Apache Mahout Tutorial - Recommendation - 2013/2014

General Architecture

External Applications invoking Mahout APIs

Page 29: Apache Mahout Tutorial - Recommendation - 2013/2014

In this tutorial we will focus on

Recommendation

Page 30: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation

• Mahout implements a Collaborative Filtering framework – Popularized by Amazon and others

– Uses historical data (ratings, clicks, and purchases) to provide recommendations • User-based: Recommend items by finding similar users. This is

often harder to scale because of the dynamic nature of users.

• Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline.

• Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).

Page 31: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

• Brief Recap

• Sample Rating database (from Wikipedia)

Page 32: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

• Brief Recap

• Sample Rating database (from Wikipedia)

• Is a variant of item-based CF

• Compute predictions by taking into account the average differences between item’s ratings

Page 33: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

• Step One: simplification – Just two items

– Goal: to compute Lucy’s rating for item A

– Algorithm: compute the average difference between ratings fro Item A and B

– difference: +0.5 for Item A

– Rating(Lucy,ItemA) = Rating(ItemB) + difference = 2.5

X

Page 34: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

X • Step One: simplification

– Just two items

– Goal: to compute Lucy’s rating for item A

– Algorithm: compute the average difference between ratings fro Item A and C

– difference: +3 for Item A

– Rating(Lucy,ItemA) = Rating(ItemC) + difference = 8

Page 35: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

• Step One: combining partial computations

– Lucy’s rating for item A is the weighted sum of the estimation based on item B and the estimation based on item C

Page 36: Apache Mahout Tutorial - Recommendation - 2013/2014

Slope-One Recommender

• Step One: combining partial computations

– Weight: #users that voted both items

• John and Mark voted both Item A and Item B

• Just John voted both Item A and Item C

Page 37: Apache Mahout Tutorial - Recommendation - 2013/2014

(comin’ back to Mahout)

Page 38: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

Page 39: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

Inceptive Idea:

A Java/J2EE application invokes a Mahout

Recommender whose DataModel is based on a set of User Preferences that are

built on the ground of a physical DataStore

Page 40: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

Physical Storage (database, files, etc.)

Page 41: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

Data Model

Physical Storage (database, files, etc.)

Page 42: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

Data Model

Recommender

Physical Storage (database, files, etc.)

Page 43: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation - Architecture

External Application

Physical Storage (database, files, etc.)

Data Model

Recommender

Page 44: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation in Mahout

• Input: raw data (user preferences) • Output: preferences estimation

• Step 1

– Mapping raw data into a DataModel Mahout-compliant

• Step 2 – Tuning recommender components

• Similarity measure, neighborhood, etc.

• Step 3 – Computing rating estimations

• Step 4 – Evaluating recommendation

Page 45: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation Components

• Mahout key abstractions are implemented through Java interfaces : – DataModel interface

• Methods for mapping raw data to a Mahout-compliant form

– UserSimilarity interface • Methods to calculate the degree of correlation between two users

– ItemSimilarity interface • Methods to calculate the degree of correlation between two items

– UserNeighborhood interface • Methods to define the concept of ‘neighborhood’

– Recommender interface • Methods to implement the recommendation step itself

Page 46: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation Components

• Mahout key abstractions are implemented through Java interfaces :

– example: DataModel interface

• Each methods for mapping raw data to a Mahout-compliant form is an implementation of the generic interface

• e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database

• (and so on)

Page 47: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: DataModel

• A DataModel is the interface to draw information about user preferences.

• Which sources is it possible to draw? – Database

• MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel

– External Files • FileDataModel

– Generic (preferences directly feed through Java code) • GenericDataModel

(They are all implementations of the DataModel interface)

Page 48: Apache Mahout Tutorial - Recommendation - 2013/2014

• GenericDataModel – Feed through Java calls

• FileDataModel – CSV (Comma Separated Values)

• JDBCDataModel – JDBC Driver – Standard database structure

Components: DataModel

Page 49: Apache Mahout Tutorial - Recommendation - 2013/2014

FileDataModel – CSV input

Page 50: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: DataModel

• Regardless the source, they all share a common implementation.

• Basic object: Preference

– Preference is a triple (user,item,score)

– Stored in UserPreferenceArray

Page 51: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: DataModel

• Basic object: Preference

– Preference is a triple (user,item,score)

– Stored in UserPreferenceArray

• Two implementations

– GenericUserPreferenceArray

• It stores numerical preference, as well.

– BooleanUserPreferenceArray

• It skips numerical preference values.

Page 52: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: UserSimilarity

• UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity

between two Items.

• Which definition of similarity are available?

– Pearson Correlation – Spearman Correlation – Euclidean Distance – Tanimoto Coefficient – LogLikelihood Similarity

– Already implemented!

Page 53: Apache Mahout Tutorial - Recommendation - 2013/2014

Example: TanimotoDistance

Page 54: Apache Mahout Tutorial - Recommendation - 2013/2014

Example: CosineSimilarity

Page 55: Apache Mahout Tutorial - Recommendation - 2013/2014

Different Similarity definitions

influence neighborhood formation

Page 56: Apache Mahout Tutorial - Recommendation - 2013/2014

Pearson’s vs. Euclidean distance

Page 57: Apache Mahout Tutorial - Recommendation - 2013/2014

Pearson’s vs. Euclidean distance

Page 58: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: UserNeighborhood

• Which definition of neighborhood are available? – Nearest N users

• The first N users with the highest similarity are labeled as ‘neighbors’

– Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’

– Already implemented!

Page 59: Apache Mahout Tutorial - Recommendation - 2013/2014

Components: Recommender

• Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item

• Which recommendation algorithms are implemented? – User-based CF – Item-based CF – SVD-based CF – SlopeOne CF

(and much more…)

Page 60: Apache Mahout Tutorial - Recommendation - 2013/2014

Recommendation Engines

Page 61: Apache Mahout Tutorial - Recommendation - 2013/2014

Recap

• Many implementations of a CF-based recommender! – 6 different recommendation algorithms – 2 different neighborhood definitions – 5 different similarity definitions

• Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save

time in the evaluation of the different combinations of the parameters!

– Standard interface for the evaluation of a Recommender System

Page 62: Apache Mahout Tutorial - Recommendation - 2013/2014

Evaluation

• Mahout provides classes for the evaluation of a recommender system

– Prediction-based measures

• Mean Average Error

• RMSE (Root Mean Square Error)

– IR-based measures

• Precision, Recall, F1-measure, F1@n

• NDCG (ranking measure)

Page 63: Apache Mahout Tutorial - Recommendation - 2013/2014

Evaluation

• Prediction-based Measures

– Class: AverageAbsoluteDifferenceEvaluator

– Method: evaluate()

– Parameters:

• Recommender implementation

• DataModel implementation

• TrainingSet size (e.g. 70%)

• % of the data to use in the evaluation (smaller % for fast prototyping)

Page 64: Apache Mahout Tutorial - Recommendation - 2013/2014

Evaluation

• IR-based Measures

– Class: GenericRecommenderIRStatsEvaluator

– Method: evaluate()

– Parameters:

• Recommender implementation

• DataModel implementation

• Relevance Threshold (mean+standard deviation)

• % of the data to use in the evaluation (smaller % for fast prototyping)

Page 65: Apache Mahout Tutorial - Recommendation - 2013/2014

Part 2

How to use Mahout? Hands-on

Page 66: Apache Mahout Tutorial - Recommendation - 2013/2014

Download Mahout

• Download – The latest Mahout release is 0.8

– Available at: http://apache.fastbull.org/mahout/0.8/mahout-distribution-0.8.zip

- Extract all the libraries and include them in a new NetBeans (Eclipse) project

• Requirement: Java 1.6.x or greater.

• Hadoop is not mandatory!

Page 68: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 1

• Create a Preference object

• Set preferences through some simple Java call

• Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.)

Page 69: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 1

• Create a Preference object

• Set preferences through some simple Java call

• Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.)

• Hints about objects to be used: – Preference

– GenericUserPreferenceArray

Page 70: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.Preference;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreatePreferenceArray {

private CreatePreferenceArray() {

}

public static void main(String[] args) {

PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);

user1Prefs.setItemID(0, 101L);

user1Prefs.setValue(0, 2.0f);

user1Prefs.setItemID(1, 102L);

user1Prefs.setValue(1, 3.0f);

Preference pref = user1Prefs.get(1);

System.out.println(pref);

}

}

Page 71: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.Preference;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreatePreferenceArray {

private CreatePreferenceArray() {

}

public static void main(String[] args) {

PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);

user1Prefs.setItemID(0, 101L);

user1Prefs.setValue(0, 2.0f);

user1Prefs.setItemID(1, 102L);

user1Prefs.setValue(1, 3.0f);

Preference pref = user1Prefs.get(1);

System.out.println(pref);

}

}

Score 2 for Item 101

Page 72: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 2

• Create a DataModel

• Feed the DataModel through some simple Java calls

• Print some statistics about data (how many users, how many items, maximum ratings, etc.)

Page 73: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 2

• Create a DataModel

• Feed the DataModel through some simple Java calls

• Print some statistics about data (how many users, how many items, maximum ratings, etc.)

• Hints about objects to be used: – FastByIdMap

– Model

Page 74: Apache Mahout Tutorial - Recommendation - 2013/2014

• PreferenceArray stores the preferences of a single user

• Where do the preferences of all the users are stored? – An HashMap? No.

– Mahout introduces data structures optimized for recommendation tasks • HashMap are replaced by FastByIDMap

Exercise 2: data model

Page 75: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 2: data model import org.apache.mahout.cf.taste.impl.common.FastByIDMap;

import org.apache.mahout.cf.taste.impl.model.GenericDataModel;

import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.DataModel;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreateGenericDataModel {

private CreateGenericDataModel() {

}

public static void main(String[] args) {

FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();

PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10);

prefsForUser1.setUserID(0, 1L);

prefsForUser1.setItemID(0, 101L);

prefsForUser1.setValue(0, 3.0f);

prefsForUser1.setItemID(1, 102L);

prefsForUser1.setValue(1, 4.5f);

preferences.put(1L, prefsForUser1);

DataModel model = new GenericDataModel(preferences);

System.out.println(model);

}

}

Page 76: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

Page 77: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users – CSV file should contain enough data!

• Hints about objects to be used: – FileDataModel

– PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.

Page 78: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

}

Page 79: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} FileDataModel

Page 80: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} Similarity Definitions

Page 81: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} Output

Page 82: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood

• Generate recommendations

Page 83: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood

• Generate recommendations

– Compare different combinations of parameters!

Page 84: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4

• Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood • Generate recommendations

– Compare different combinations of parameters!

• Hints about objects to be used: – NearestNUserNeighborhood – GenericUserBasedRecommender

• Parameters: data model, neighborhood, similarity measure

Page 85: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Page 86: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

FileDataModel

Page 87: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

2 neighbours

Page 88: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Top-1 Recommendation for User 1

Page 89: Apache Mahout Tutorial - Recommendation - 2013/2014

• Download the GroupLens dataset (100k) – Its format is already Mahout compliant

– http://files.grouplens.org/datasets/movielens/ml-100k.zip

• Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset

• Next: now we can run the recommendation framework against a state-of-the-art dataset

Exercise 5: MovieLens Recommender

Page 90: Apache Mahout Tutorial - Recommendation - 2013/2014

import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("ua.base"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 20);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Exercise 5: MovieLens Recommender

Page 91: Apache Mahout Tutorial - Recommendation - 2013/2014

import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("ua.base"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(10, 50);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Exercise 5: MovieLens Recommender

We can play with parameters!

Page 92: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 5: MovieLens Recommender

• Analyze Recommender behavior with different combination of parameters

– Do the recommendations change with a different similarity measure?

– Do the recommendations change with different neighborhood sizes?

– Which one is the best one?

• …. Let’s go to the next exercise!

Page 93: Apache Mahout Tutorial - Recommendation - 2013/2014

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: RMSE, MAE, Precision

Exercise 6: Recommender Evaluation

Page 94: Apache Mahout Tutorial - Recommendation - 2013/2014

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: RMSE, MAE

• Hints: useful classes – Implementations of RecommenderEvaluator

interface • AverageAbsoluteDifferenceRecommenderEvaluator

• RMSRecommenderEvaluator

Exercise 6: Recommender Evaluation

Page 95: Apache Mahout Tutorial - Recommendation - 2013/2014

• Further Hints: – Use RandomUtils.useTestSeed()to ensure

the consistency among different evaluation runs

– Invoke the evaluate() method

• Parameters

– RecommenderBuilder: recommender istance (as in previous exercises.

– DataModelBuilder: specific criterion for training

– Split Training-Test: double value (e.g. 0.7 for 70%)

– Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)

Exercise 6: Recommender Evaluation

Page 96: Apache Mahout Tutorial - Recommendation - 2013/2014

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

Ensures the consistency between different evaluation runs.

Page 97: Apache Mahout Tutorial - Recommendation - 2013/2014

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

Page 98: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

70%training (evaluation on the whole

dataset)

Page 99: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

Recommendation Engine

Page 100: Apache Mahout Tutorial - Recommendation - 2013/2014

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

} We can add more measures

Page 101: Apache Mahout Tutorial - Recommendation - 2013/2014

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse);

}

}

Page 102: Apache Mahout Tutorial - Recommendation - 2013/2014

Mahout Strengths

• Fast-prototyping and evaluation

– To evaluate a different configuration of the same algorithm we just need to update a parameter and run again.

– Example

• Different Neighborhood Size

Page 103: Apache Mahout Tutorial - Recommendation - 2013/2014

5 minutes to look for the best configuration

Page 104: Apache Mahout Tutorial - Recommendation - 2013/2014

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: Precision, Recall

Exercise 7: Recommender Evaluation

Page 105: Apache Mahout Tutorial - Recommendation - 2013/2014

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: Precision, Recall

• Hints: useful classes

– GenericRecommenderIRStatsEvaluator

– Evaluate() method

• Same parameters of exercise 6

Exercise 7: Recommender Evaluation

Page 106: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 7: IR-based evaluation

Precision@5 , Recall@5, etc.

Page 107: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 7: IR-based evaluation

Precision@5 , Recall@5, etc.

Page 108: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 7: IR-based evaluation

Set Neighborhood to 500

Page 109: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new EuclideanDistanceSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 7: IR-based evaluation

Set Euclidean Distance

Page 110: Apache Mahout Tutorial - Recommendation - 2013/2014

Exercise 8: item-based recommender

• Mahout provides Java classes for building an item-based recommender system

– Amazon-like

– Recommendations are based on similarities among items (generally pre-computed offline)

– Evaluate it with the MovieLens dataset!

Page 111: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Example 8: item-based recommender

Page 112: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

} ItemSimilarity

Example 8: item-based recommender

Page 113: Apache Mahout Tutorial - Recommendation - 2013/2014

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

} No Neighborhood definition for item-

based recommenders

Example 8: item-based recommender

Page 114: Apache Mahout Tutorial - Recommendation - 2013/2014

• Write a class that automatically runs evaluation with different parameters

– e.g. fixed neighborhood sizes from an Array of values

– Print the best scores and the configuration

Exercise 9: Recommender Evaluation

Page 115: Apache Mahout Tutorial - Recommendation - 2013/2014

• Find the best configuration for several datasets

– Download datasets from http://mahout.apache.org/users/basics/collections.html

–Write classes to transform input data in a Mahout-compliant form

– Extend exercise 9!

Exercise 10: more datasets!

Page 116: Apache Mahout Tutorial - Recommendation - 2013/2014

End. Do you want more?

Page 117: Apache Mahout Tutorial - Recommendation - 2013/2014

Do you want more?

• Recommendation

– Deploy of a Mahout-based Web Recommender

– Integration with Hadoop

– Integration of content-based information

– Custom similarities, Custom recommenders, Re-scoring functions

• Classification, Clustering and Pattern Mining