sd forum 11 04-2010

30
Apache Mahout Thursday, November 4, 2010

Upload: mapr-technologies

Post on 15-Jan-2015

192 views

Category:

Technology


4 download

DESCRIPTION

SD Forum November 4, 2010

TRANSCRIPT

Page 1: SD Forum 11 04-2010

Apache Mahout

Thursday, November 4, 2010

Page 2: SD Forum 11 04-2010

Apache MahoutNow with extra whitening and classification powers!

Thursday, November 4, 2010

Page 3: SD Forum 11 04-2010

• Mahout intro

• Scalability in general

• Supervised learning recap

• The new SGD classifiers

Thursday, November 4, 2010

Page 4: SD Forum 11 04-2010

Mahout?

• Hebrew for “essence”

• Hindi for a guy who drives an elephant

Thursday, November 4, 2010

Page 5: SD Forum 11 04-2010

Mahout?

• Hebrew for “essence”

• Hindi for a guy who drives an elephant

Thursday, November 4, 2010

Page 6: SD Forum 11 04-2010

Mahout?

• Hebrew for “essence”

• Hindi for a guy who drives an elephant

Thursday, November 4, 2010

Page 7: SD Forum 11 04-2010

Mahout!

• Scalable data-mining and recommendations

• Not all data-mining

• Not the fanciest data-mining

• Just some of the scalable stuff

• Not a competitor for R or Weka

Thursday, November 4, 2010

Page 8: SD Forum 11 04-2010

General Areas

• Recommendations

• lots of support, lots of flexibility, production ready

• Unsupervised learning (clustering)

• lots of options, lots of flexibility, production ready (ish)

Thursday, November 4, 2010

Page 9: SD Forum 11 04-2010

General Areas

• Supervised learning (classification)

• multiple architectures, fair number of options, somewhat inter-operable

• production ready (for the right definition of production and ready)

• Large scale SVD

• larger scale coming, beware sharp edges

Thursday, November 4, 2010

Page 10: SD Forum 11 04-2010

Scalable?

• Scalable means

• Time is proportional to problem size by resource size

• Does not imply Hadoop or parallel

BRIEF ARTICLE

THE AUTHOR

t ∝ |P ||R|

1

Thursday, November 4, 2010

Page 11: SD Forum 11 04-2010

WallClockTime

# of Training Examples

Scalable Algorithm(Mahout wins!)

Traditional Datamining Works here

Scalable Solutions Required

Non-scalable Algorithm

Thursday, November 4, 2010

Page 12: SD Forum 11 04-2010

Scalable means ...

• One unit of work requires about a unit of time

• Not like the company store (bit.ly/22XVa4)

BRIEF ARTICLE

THE AUTHOR

t ∝ |P ||R|

|P | = O(1) =⇒ t = O(1)

1

Thursday, November 4, 2010

Page 13: SD Forum 11 04-2010

WallClockTime

# of Training Examples

Parallel Algorithm

Sequential Algorithm Preferred

Parallel Algorithm Preferred

Sequential Algorithm

Thursday, November 4, 2010

Page 14: SD Forum 11 04-2010

Toy Example

Thursday, November 4, 2010

Page 15: SD Forum 11 04-2010

Training Data Sample

yes

no 0.92 0.01 circle

0.30 0.41 square

Filled?

x coordinate y coordinate shape

predictor variables

target variable

Thursday, November 4, 2010

Page 16: SD Forum 11 04-2010

What matters most?

!

!

!

!!

!

!

!!

!

Thursday, November 4, 2010

Page 17: SD Forum 11 04-2010

SGD Classification

• Supervised learning of logistic regression

• Sequential gradient descent, not parallel

• Highly optimized for high dimensional sparse data, possibly with interactions

• Scalable, real dang fast to train

Thursday, November 4, 2010

Page 18: SD Forum 11 04-2010

Supervised Learning

T x1 ... xnT x1 ... xn

T x1 ... xnT x1 ... xn

T x1 ... xn

Model

Model

TT

TT

T

LearningAlgorithm

? x1 ... xn? x1 ... xn

? x1 ... xn? x1 ... xn

? x1 ... xn

Thursday, November 4, 2010

Page 19: SD Forum 11 04-2010

Supervised Learning

T x1 ... xnT x1 ... xn

T x1 ... xnT x1 ... xn

T x1 ... xn

Model

Model

TT

TT

T

LearningAlgorithm

? x1 ... xn? x1 ... xn

? x1 ... xn? x1 ... xn

? x1 ... xn

Sequential but fast

Thursday, November 4, 2010

Page 20: SD Forum 11 04-2010

Supervised Learning

T x1 ... xnT x1 ... xn

T x1 ... xnT x1 ... xn

T x1 ... xn

Model

Model

TT

TT

T

LearningAlgorithm

? x1 ... xn? x1 ... xn

? x1 ... xn? x1 ... xn

? x1 ... xn

Sequential but fast

Stateless, parallel

Thursday, November 4, 2010

Page 21: SD Forum 11 04-2010

Small example

• On 20 newsgroups

• converges in < 10,000 training examples (less than one pass through the data)

• accuracy comparable to SVM, Naive Bayes, Complementary Naive Bayes

• learning rate, regularization set automagically on held-out data

Thursday, November 4, 2010

Page 22: SD Forum 11 04-2010

System Structure

EvolutionaryProcess epvoid train(target, features)

AdaptiveLogisticRegression

20

1

OnlineLogisticRegression foldsvoid train(target, tracking, features)double auc()

CrossFoldLearner

51

Matrix betavoid train(target, features)double classifyScalar(features)

OnlineLogisticRegression

Thursday, November 4, 2010

Page 23: SD Forum 11 04-2010

Training API

public interface OnlineLearner {

void train(int actual, Vector instance);

void train(long trackingKey, int actual, Vector instance);

void train(long trackingKey, String groupKey, int actual, Vector instance);

void close();}

Thursday, November 4, 2010

Page 24: SD Forum 11 04-2010

Classification APIpublic class AdaptiveLogisticRegression implements OnlineLearner { public AdaptiveLogisticRegression(int numCategories, int numFeatures, PriorFunction prior); public void train(int actual, Vector instance); public void train(long trackingKey, int actual, Vector instance); public void train(long trackingKey, String groupKey, int actual, Vector instance); public void close();

public double auc(); public State<Wrapper> getBest();}

CrossFoldLearner model = learningAlgorithm.getBest().getPayload().getLearner(); double averageCorrect = model.percentCorrect(); double averageLL = model.logLikelihood();

double p = model.classifyScalar(features);

Thursday, November 4, 2010

Page 25: SD Forum 11 04-2010

Speed?

• Encoding API for hashed feature vectors

• String, byte[] or double interfaces

• String allows simple parsing

• byte[] and double allows speed

• Abstract interactions supported

Thursday, November 4, 2010

Page 26: SD Forum 11 04-2010

Speed!

• Parsing and encoding dominate single learner

• Moderate optimization allows 1 million training examples with 200 features to be encoded in 14 seconds in a single core

• 20 million mixed text, categorical features with many interactions learned in ~ 1 hour

Thursday, November 4, 2010

Page 27: SD Forum 11 04-2010

More Speed!

• Evolutionary optimization of learning parameters allows simple operation

• 20x threading allows high machine use

• 20 newsgroup test completes in less time on single node with SGD than on Hadoop with Complementary Naive Bayes

Thursday, November 4, 2010

Page 28: SD Forum 11 04-2010

Summary

• Mahout provides early production quality scalable data-mining

• New classification systems allow industrial scale classification

Thursday, November 4, 2010

Page 29: SD Forum 11 04-2010

Contact Info

Ted [email protected]

Thursday, November 4, 2010