Building and Deploying ML Applications on production
in a fraction of the time.
A Machine Learning Server in Scala
Available Tools
Processing Framework • e.g. Apache Spark, Apache Hadoop
Algorithm Libraries • e.g. MLlib, Mahout
Data Storage • e.g. HBase, Cassandra
Integrate everything together nicely and move from prototyping to production.
What is Missing?
You have a mobile app
A Classic Recommender Example…
AppPredict
products
You need a Recommendation Engine Predict products that a customer will like – and show it.
Predictive model
Algorithm - You don't need to write your own: Spark MLlib - ALS algorithm
Predictive model - based on users’ previous behaviors
def pseudocode () {// Read training data val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match { …. })
// Build a predictive model with an algorithmval model = ALS.train(trainingData, 10, 20, 0.01)
// Make prediction allUsers.foreach { user => model.recommendProducts(user, 5)}
}
A Classic Recommender Example prototyping…
• How to deploy a scalable service that respond to dynamic prediction query?
• How do you persist the predictive model, in a distributed environment?
• How to make HBase, Spark and algorithms talking to each other?
• How should I prepare, or transform, the data for model training?
• How to update the model with new data without downtime?
• Where should I add some business logics?
• How to make the code configurable, re-usable and maintainable?
• How do I build all these with a separate of concerns (SoC)?
Beyond Prototyping
Engine
Event Server(data storage)
Data: User Actions
Query via REST: User ID
Predicted Result:A list of Product IDs
A Classic Recommender Example on production…
Mobile App
• PredictionIO is a machine learning server for building and deploying predictive engines on productionin a fraction of the time.
• Built on Apache Spark, MLlib and HBase.
PredictionIO
Data: User Actions
Query via REST: User ID
Predicted Result:A list of Product IDs Engine
Event Server(data storage)
Mobile App
Event Server
• $ pio eventserver
• Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } )
Event Server Collecting Date
Query via REST: User ID
Predicted Result:A list of Product IDs Engine
Data: User Actions Event Server(data storage)
Mobile App
Engine
• DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator
Engine Building an Engine with Separation of Concerns (SoC)
A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation
Engine Functions of an Engine
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData
class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData
class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model
$ pio train
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)
val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
Engine A. Train predictive model(s)
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
Engine A. Train predictive model(s)
Event Server
Algorithm 1 Algorithm 3Algorithm 2
PreparedDate
Engine
Data Preparator
Data Source
TrainingDate
Model 3Model 1Model 2
B. Respond to dynamic queryEngine
• Query (Input) :$ curl -H "Content-Type: application/json" -d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json
case class Query(
val user: String, val num: Int ) extends Serializable
B. Respond to dynamic queryEngine
• Predicted Result (Output):{“itemScores”:[{"item":"22","score":4.072304374729956},{"item":"62","score":4.058482414005789}, {"item":"75","score":4.046063009943821}]}
case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable
case class ItemScore( item: String, score: Double ) extends Serializable
class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult
class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult
B. Respond to dynamic queryEngine
Query via REST
Engine B. Respond to dynamic query
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
B. Respond to dynamic queryEngine
Algorithm 1Model 1
Serving
Mobile App
Algorithm 3Model 3
Algorithm 2Model 2
Predicted Results
Query (input)
Predicted Result (output)
Engine
Engine DASE Factory
object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
Running on Production
• Install PredictionIO$ bash -c "$(curl -s http://install.prediction.io/install.sh)"
• Start the Event Server $ pio eventserver
• Deploy an Engine$ pio build; pio train; pio deploy
• Update Engine Model with New Data $ pio train; pio deploy
Deploy on Production
Website
Mobile App
EmailCampaign
Event Server(data storage)
Data
Query via REST
PredictedResult
Engine 1
Engine 3
Engine 2
Engine 4
The Next Step
• Quickstart with an Engine Template!
• Follow on Github: github.com/predictionio/
• Learn PredictionIO: prediction.io/
• Learn Scala! Scala for the Impatient
• Contribute!