predictionio - scalable machine learning architecture

18
Simon Chan [email protected] Data Science London - April 24, 2013 Big Data Week

Upload: predictionio

Post on 27-Jan-2015

140 views

Category:

Technology


0 download

DESCRIPTION

PredictionIO's presentation slides for Data Science London on April 24, 2013 during the Big Data Week.

TRANSCRIPT

Page 1: PredictionIO - Scalable Machine Learning Architecture

Simon [email protected]

Data Science London - April 24, 2013Big Data Week

Page 2: PredictionIO - Scalable Machine Learning Architecture

Machine Learning is....

computers learning to predict

from data

Page 3: PredictionIO - Scalable Machine Learning Architecture

putting

Machine Learning

into practice

Page 4: PredictionIO - Scalable Machine Learning Architecture

challenge #1

Scalability

Page 5: PredictionIO - Scalable Machine Learning Architecture

Big Data Bottlenecks

Machine Learning Processing

Page 6: PredictionIO - Scalable Machine Learning Architecture

PredictionIO has ahorizontally scalablearchitecture

Page 7: PredictionIO - Scalable Machine Learning Architecture
Page 8: PredictionIO - Scalable Machine Learning Architecture

Async SDK

Client client = new Client(appkey);

// Adding user behaviors

req = client.getUserRateItemRequestBuilder(uid, iid, rate);

client.userRateItemAsFuture(req);

Page 9: PredictionIO - Scalable Machine Learning Architecture

Play Framework

‣ stateless - no server session

‣ non-blocking web request

Page 10: PredictionIO - Scalable Machine Learning Architecture

Play: A Non-blocking Example

def index = Action { val futureInt = scala.concurrent.Future { slowDataProcess() } Async { futureInt.map(i => Ok(views.html.result.render(i))) }}

Page 11: PredictionIO - Scalable Machine Learning Architecture

MongoDB

‣ Read scaling: Replica Sets

‣ Write scaling: Sharding

‣ Indexes (e.g. geospatial)

{ geoSearch : "places", near : [33, 33], maxDistance : 6, search : { uid : "user1" } }

Page 12: PredictionIO - Scalable Machine Learning Architecture

Hadoop

Hadoop&

Cascading&(Java)&

Scalding&(Scala)&

Page 13: PredictionIO - Scalable Machine Learning Architecture

MapReduce- Native Java

public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws .....{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }}

Page 14: PredictionIO - Scalable Machine Learning Architecture

MapReduce- Scalding

class ScaldingTestJob(args: Args) extends Job(args) { Tsv(args(0), 'text) .flatMap('text -> 'word) { text : String => text.split("\s+") } .groupBy('word) { _.size } .write(Tsv(args(1))}

Page 15: PredictionIO - Scalable Machine Learning Architecture

Sample Code

Page 16: PredictionIO - Scalable Machine Learning Architecture

### Sample PredictionIO Python SDK Code

client = predictionio.Client(appkey="<your app key>")

# Add Data

client.create_user(uid=”user123”)

client.create_item(iid=”itemXYZ”, itypes=(1,))

client.user_view_item(uid=”user123”, iid=”itemXYZ”)

# Get Prediction

rec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)

Page 17: PredictionIO - Scalable Machine Learning Architecture

Getting Involved!

- @PredictionIO

- prediction.io - Newsletter

- github.com/predictionio

Page 18: PredictionIO - Scalable Machine Learning Architecture

Q&AQ: Selecting the right features is a big problem. Can PredictionIO solve this problem?A: Not at this moment. That’s why we focus on collaborative filtering algorithms right now which don’t require the use of features. And we believe that the involvement of data scientists is needed for many specific problems. PredictionIO is positioned as a tool to make their work easier, but not as a replacement.

Q: How’s PredictionIO different from Weka?A: Weka, like Mahout, is a ML algorithm library. You can see PredictionIO as a layer on top of it, which helps you to implement algorithm into production environment by providing a complete infrastructure.

Q: How do you compare PredictionIO with RapidMiner?A: RapidMiner is a great product to define data engineering workflow visually. PredictionIO focuses on a different problem -- i.e. deploying ML solution into production environment.

Q: How does the algorithm evaluation metrics work in PredictionIO?A: At this moment, you can evaluate algorithms by some offline metrics, such as Mean Average Precision, based on your existing data.

Q: What’s the business model?A: We focus on making PredictionIO a useful open source product at this moment.