predictionio – a machine learning server in scala – sf scala

27
Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala

Upload: predictionio

Post on 21-Apr-2017

98.185 views

Category:

Engineering


2 download

TRANSCRIPT

Building and Deploying ML Applications on production

in a fraction of the time.

A Machine Learning Server in Scala

Available Tools

Processing Framework • e.g. Apache Spark, Apache Hadoop

Algorithm Libraries • e.g. MLlib, Mahout

Data Storage • e.g. HBase, Cassandra

Integrate everything together nicely and move from prototyping to production.

What is Missing?

You have a mobile app

A Classic Recommender Example…

AppPredict

products

You need a Recommendation Engine Predict products that a customer will like – and show it.

Predictive model

Algorithm - You don't need to write your own: Spark MLlib - ALS algorithm

Predictive model - based on users’ previous behaviors

def pseudocode () {// Read training data val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match { …. })

// Build a predictive model with an algorithmval model = ALS.train(trainingData, 10, 20, 0.01)

// Make prediction allUsers.foreach { user => model.recommendProducts(user, 5)}

}

A Classic Recommender Example prototyping…

• How to deploy a scalable service that respond to dynamic prediction query?

• How do you persist the predictive model, in a distributed environment?

• How to make HBase, Spark and algorithms talking to each other?

• How should I prepare, or transform, the data for model training?

• How to update the model with new data without downtime?

• Where should I add some business logics?

• How to make the code configurable, re-usable and maintainable?

• How do I build all these with a separate of concerns (SoC)?

Beyond Prototyping

Engine

Event Server(data storage)

Data: User Actions

Query via REST: User ID

Predicted Result:A list of Product IDs

A Classic Recommender Example on production…

Mobile App

• PredictionIO is a machine learning server for building and deploying predictive engines on productionin a fraction of the time.

• Built on Apache Spark, MLlib and HBase.

PredictionIO

Data: User Actions

Query via REST: User ID

Predicted Result:A list of Product IDs Engine

Event Server(data storage)

Mobile App

Event Server

• $ pio eventserver

• Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } )

Event Server Collecting Date

Query via REST: User ID

Predicted Result:A list of Product IDs Engine

Data: User Actions Event Server(data storage)

Mobile App

Engine

• DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator

Engine Building an Engine with Separation of Concerns (SoC)

A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation

Engine Functions of an Engine

Engine A. Train predictive model(s)

class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData

class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData

class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model

$ pio train

Engine A. Train predictive model(s)

class DataSource(…) extends PDataSource

override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)

val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }

Engine A. Train predictive model(s)

class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm

def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }

Engine A. Train predictive model(s)

Event Server

Algorithm 1 Algorithm 3Algorithm 2

PreparedDate

Engine

Data Preparator

Data Source

TrainingDate

Model 3Model 1Model 2

B. Respond to dynamic queryEngine

• Query (Input) :$ curl -H "Content-Type: application/json" -d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json

case class Query(

val user: String, val num: Int ) extends Serializable

B. Respond to dynamic queryEngine

• Predicted Result (Output):{“itemScores”:[{"item":"22","score":4.072304374729956},{"item":"62","score":4.058482414005789}, {"item":"75","score":4.046063009943821}]}

case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable

case class ItemScore( item: String, score: Double ) extends Serializable

class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult

class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult

B. Respond to dynamic queryEngine

Query via REST

Engine B. Respond to dynamic query

class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm

def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }

B. Respond to dynamic queryEngine

Algorithm 1Model 1

Serving

Mobile App

Algorithm 3Model 3

Algorithm 2Model 2

Predicted Results

Query (input)

Predicted Result (output)

Engine

Engine DASE Factory

object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }

Running on Production

• Install PredictionIO$ bash -c "$(curl -s http://install.prediction.io/install.sh)"

• Start the Event Server $ pio eventserver

• Deploy an Engine$ pio build; pio train; pio deploy

• Update Engine Model with New Data $ pio train; pio deploy

Deploy on Production

Website

Mobile App

EmailCampaign

Event Server(data storage)

Data

Query via REST

PredictedResult

Engine 1

Engine 3

Engine 2

Engine 4

The Next Step

• Quickstart with an Engine Template!

• Follow on Github: github.com/predictionio/

• Learn PredictionIO: prediction.io/

• Learn Scala! Scala for the Impatient

• Contribute!

Thanks.

Simon [email protected]

@PredictionIO

prediction.io (Newsletters)

github.com/predictionio