apache mahout feb 13, 2012 shannon quinn cloud computing cs 15-319

Download Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319

If you can't read please download the document

Upload: herbert-snow

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319
  • Slide 2
  • MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce
  • Slide 3
  • MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce This is our focus!
  • Slide 4
  • Apache Mahout A scalable machine learning library
  • Slide 5
  • Apache Mahout A scalable machine learning library Built on Hadoop
  • Slide 6
  • Apache Mahout A scalable machine learning library Built on Hadoop Philosophy of Mahout (and Hadoop by proxy)
  • Slide 7
  • What does Mahout do?
  • Slide 8
  • Recommendation
  • Slide 9
  • Classification
  • Slide 10
  • Clustering
  • Slide 11
  • Other Mahout Algorithms Dimensionality Reduction Regression Evolutionary Algorithms
  • Slide 12
  • Mahout 1.Recommendation 2.Classification 3.Clustering
  • Slide 13
  • Recommendation Overview Help users find items they might like based on historical preferences
  • Slide 14
  • Recommendation Overview Mathematically
  • Slide 15
  • Recommendation Overview Alice Bob Peter 514 ? 25 432*based on example by Sebastian Schelter
  • Slide 16
  • Recommendation Overview 514 - 25 432*based on example by Sebastian Schelter
  • Slide 17
  • Recommendation Overview 514 ? 25 432 Bob *based on example by Sebastian Schelter
  • Slide 18
  • Recommendation Overview 514 1.5 25 432 Bob *based on example by Sebastian Schelter
  • Slide 19
  • Recommendation in Mahout 1 st Map phase: process input *based on example by Sebastian Schelter
  • Slide 20
  • Recommendation in Mahout 1 st Map phase: process input 1 st Reduce phase: list by user *based on example by Sebastian Schelter
  • Slide 21
  • Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings *based on example by Sebastian Schelter
  • Slide 22
  • Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings 2 nd Reduce phase: Compute similarities *based on example by Sebastian Schelter
  • Slide 23
  • Mahout 1.Recommendation 2.Classification 3.Clustering
  • Slide 24
  • Classification Overview Assigning data to discrete categories
  • Slide 25
  • Classification Overview Assigning data to discrete categories Train a model on labeled data Spam Not spam
  • Slide 26
  • Classification Overview Assigning data to discrete categories Train a model on labeled data Run the model on new, unlabeled data Spam Not spam ?
  • Slide 27
  • Nave Bayes Example
  • Slide 28
  • Prob (token | label) =
  • Slide 29
  • Nave Bayes Example Not spam
  • Slide 30
  • Nave Bayes Example President Obamas Nobel Prize Speech Not spam
  • Slide 31
  • Nave Bayes Example Spam
  • Slide 32
  • Nave Bayes Example Spam Spam email content
  • Slide 33
  • Nave Bayes Example
  • Slide 34
  • Order a trial Adobe chicken daily EAB-List new summer savings, welcome!
  • Slide 35
  • Nave Bayes in Mahout Complex!
  • Slide 36
  • Nave Bayes in Mahout Complex! Training 1.Read the features
  • Slide 37
  • Nave Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics
  • Slide 38
  • Nave Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories
  • Slide 39
  • Nave Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label
  • Slide 40
  • Nave Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label Testing Classification
  • Slide 41
  • Other Classification Algorithms Stochastic Gradient Descent
  • Slide 42
  • Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines
  • Slide 43
  • Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines Random Forests
  • Slide 44
  • Mahout 1.Recommendation 2.Classification 3.Clustering
  • Slide 45
  • Clustering Overview Grouping unstructured data
  • Slide 46
  • Clustering Overview Grouping unstructured data Small intra-cluster distance
  • Slide 47
  • Clustering Overview Grouping unstructured data Small intra-cluster distance Large inter-cluster distance
  • Slide 48
  • K-Means Clustering Example
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Cats Dogs
  • Slide 58
  • K-Means Clustering in Mahout + C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Figure from lecture 6: MapReduce
  • Slide 59
  • K-Means Clustering in Mahout Assume: # clusters