velox at sf data mining meetup

145
VELOX: MODELS IN ACTION Dan Crankshaw UC Berkeley AMPLab [email protected] Marin Software 2015

Upload: dan-crankshaw

Post on 17-Jul-2015

531 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Velox at SF Data Mining Meetup

VELOX: MODELS IN ACTION

Dan Crankshaw UC Berkeley AMPLab

[email protected]

Marin Software 2015

Page 2: Velox at SF Data Mining Meetup

Algorithms, Machines, and People

Page 3: Velox at SF Data Mining Meetup

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 4: Velox at SF Data Mining Meetup

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 5: Velox at SF Data Mining Meetup

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 6: Velox at SF Data Mining Meetup

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 7: Velox at SF Data Mining Meetup

GraphX

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 8: Velox at SF Data Mining Meetup

GraphX

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 9: Velox at SF Data Mining Meetup

GraphX

Algorithms, Machines, and People

“deep questions over dirty and heterogenous data”

Page 10: Velox at SF Data Mining Meetup

BERKELEY DATA ANALYTICS STACK (BDAS)

Spark

SparkStreaming Spark SQL

BlinkDBGraphX

MLlib

MLBase

HDFS, S3, … Tachyon

Mesos Hadoop Yarn

Page 11: Velox at SF Data Mining Meetup

Catify: Music for Cats

Page 12: Velox at SF Data Mining Meetup

MODELING TASK

Rating

Songs

Page 13: Velox at SF Data Mining Meetup

MODELING TASK

Ratings

Songs

Prediction

Page 14: Velox at SF Data Mining Meetup

Catify: Music for Cats

Page 15: Velox at SF Data Mining Meetup

Catify: Music for CatsCatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 16: Velox at SF Data Mining Meetup

Catify: Music for Cats

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 17: Velox at SF Data Mining Meetup

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 18: Velox at SF Data Mining Meetup

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 19: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

Catify: Music for Cats

Page 20: Velox at SF Data Mining Meetup

Catify: Music for Cats

Songs

Users

Page 21: Velox at SF Data Mining Meetup

Songs

Users

O(users * songs)

Catify: Music for Cats

Page 22: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

Catify: Music for Cats

Page 23: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

PrecomputedRatings

Catify: Music for Cats

Page 24: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

PrecomputedRatings

Catify: Music for Cats

Black box

Page 25: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

Training Data

PrecomputedRatings

Catify: Music for Cats

Black box

Page 26: Velox at SF Data Mining Meetup

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

Training Data

PrecomputedRatings

Catify: Music for Cats

Black box

Page 27: Velox at SF Data Mining Meetup

What’s wrong?

Page 28: Velox at SF Data Mining Meetup

1. Serving system: low-latency but high staleness

What’s wrong?

Page 29: Velox at SF Data Mining Meetup

1. Serving system: low-latency but high staleness

2. Batch training: slow incremental maintenance, no serving

What’s wrong?

Page 30: Velox at SF Data Mining Meetup

1. Serving system: low-latency but high staleness

2. Batch training: slow incremental maintenance, no serving

3. Ad-hoc model management

What’s wrong?

Page 31: Velox at SF Data Mining Meetup

VELOX GOALS

Page 32: Velox at SF Data Mining Meetup

VELOX GOALS

1. Low latency and fresh predictions

Page 33: Velox at SF Data Mining Meetup

VELOX GOALS

1. Low latency and fresh predictions2. Break the abstraction: model-

specific optimizations

Page 34: Velox at SF Data Mining Meetup

VELOX GOALS

1. Low latency and fresh predictions2. Break the abstraction: model-

specific optimizations3. Unified system eases operation

Page 35: Velox at SF Data Mining Meetup

Spark

SparkStreaming Spark SQL

BlinkDBGraphX

MLlib

MLBase

HDFS, S3, … Tachyon

THE MISSING PIECE IN BDAS

Mesos

Page 36: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase

Training

THE MISSING PIECE IN BDAS

Spark

HDFS, S3, … Tachyon

Mesos

Page 37: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase

Training Management + Serving

THE MISSING PIECE IN BDAS

Spark

HDFS, S3, … Tachyon

Mesos

Page 38: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase VeloxTraining Management + Serving

THE MISSING PIECE IN BDAS

Spark

HDFS, S3, … Tachyon

Mesos

Page 39: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase VeloxTraining Management + Serving

Spark

HDFS, S3, … Tachyon

THE MISSING PIECE IN BDAS

Mesos

Page 40: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase VeloxTraining Management + Serving

Spark

HDFS, S3, … Tachyon

ModelManager

THE MISSING PIECE IN BDAS

Mesos

Page 41: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase VeloxTraining Management + Serving

Spark

HDFS, S3, … Tachyon

ModelManager

PredictionService

THE MISSING PIECE IN BDAS

Mesos

Page 42: Velox at SF Data Mining Meetup

VELOX ARCHITECTURE

Page 43: Velox at SF Data Mining Meetup

VELOX ARCHITECTUREStandalone Scala

Service

Page 44: Velox at SF Data Mining Meetup

VELOX ARCHITECTUREStandalone Scala

Service

Automatic Integration with Spark

Page 45: Velox at SF Data Mining Meetup

VELOX ARCHITECTUREStandalone Scala

Service

Personalized Predictions as a Service

Automatic Integration with Spark

Page 46: Velox at SF Data Mining Meetup

VELOX ARCHITECTUREStandalone Scala

Service

Shared-Nothing Serving Cluster

Personalized Predictions as a Service

Automatic Integration with Spark

Page 47: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 48: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 49: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 50: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

Predictions via RESTfrontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 51: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

Predictions via RESTfrontend.js

Returns score

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 52: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 53: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Feedback via REST

uuid: 4

Page 54: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Feedback via REST

Model updated

in realtime

uuid: 4

Page 55: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 56: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

master

workerworker

worker

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 57: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

master

workerworker

worker

Batch train RPC

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 58: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

master

workerworker

worker

Batch train RPC

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30Returns batch trained model

Page 59: Velox at SF Data Mining Meetup

Mesos Mesos

HDFS, S3, … Tachyon

Hadoop Yarn

Spark Straming Shark

SQL

Graph X ML

library

BlinkDB MLbase

Spark

VeloxTraining Management + Serving

PREDICTION SERVICE

ModelManager

PredictionService

Page 60: Velox at SF Data Mining Meetup

PREDICTION API

GET  /velox/catify/predict?userid=22&song=27632Simple point queries:

Page 61: Velox at SF Data Mining Meetup

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632Simple point queries:

More complex ordering queries:

Page 62: Velox at SF Data Mining Meetup

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632Simple point queries:

More complex ordering queries:

Low-latency andscalable partitioning

Personalized Predictions

Page 63: Velox at SF Data Mining Meetup

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632Simple point queries:

More complex ordering queries:

Low-latency andscalable partitioning

Personalized Predictions

Intelligent Caching

Sharing and re-use of model partial-state

Page 64: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

Predictions via RESTfrontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 65: Velox at SF Data Mining Meetup

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

uuid model

Page 66: Velox at SF Data Mining Meetup

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

uuid model

Look up user model

Read

Page 67: Velox at SF Data Mining Meetup

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

uuid model

Look up user model

Primary key lookup

Read

Page 68: Velox at SF Data Mining Meetup

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

uuid model

Look up user model

Primary key lookup

Partition queries by user : always local

Read

Page 69: Velox at SF Data Mining Meetup

Compute Features

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

user independent

}f( )

Page 70: Velox at SF Data Mining Meetup

Compute Features

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

Feature computation could be costly

user independent

}f( )

Page 71: Velox at SF Data Mining Meetup

Compute Features

PREDICTION EXECUTION

def  predict(  u:  UUID,  x:  Context  )

Feature computation could be costly

user independent

}Cache features forreuse across users

f( )

Page 72: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs

Page 73: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords

Page 74: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords CandidateSongs

Page 75: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords CandidateSongs

Score andrank allcandidates

Page 76: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords CandidateSongs

By exploiting split model design we can leverage:

Score andrank allcandidates

Page 77: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords CandidateSongs

By exploiting split model design we can leverage:

Score andrank allcandidates

A. Shrivastava, P. Li. “Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).” NIPS’14 Best Paper

Page 78: Velox at SF Data Mining Meetup

TOP-K QUERIESQuery predicate to pre-filter candidate set

All Songs Playlist Keywords CandidateSongs

By exploiting split model design we can leverage:

Score andrank allcandidates

A. Shrivastava, P. Li. “Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).” NIPS’14 Best Paper

Y. Low and A. X. Zheng. “Fast Top-K Similarity Queries Via Matrix Compression.” CIKM 2012

Page 79: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 80: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

Returns score

uuid: 01-10

uuid: 11-20

uuid: 20-30

uuid: 4

Page 81: Velox at SF Data Mining Meetup

Mesos Mesos

HDFS, S3, … Tachyon

Hadoop Yarn

Spark Straming Shark

SQL

Graph X ML

library

BlinkDB MLbase

Spark

VeloxTraining Management + Serving

ModelManager

PredictionService

MODEL MANAGER

Page 82: Velox at SF Data Mining Meetup

PERSONALIZED MODELING

Page 83: Velox at SF Data Mining Meetup

PERSONALIZED MODELING

Page 84: Velox at SF Data Mining Meetup

PERSONALIZED MODELING

A Separate Model for Each User?

Page 85: Velox at SF Data Mining Meetup

PERSONALIZED MODELING

Computationally Inefficient many complex models

A Separate Model for Each User?

Page 86: Velox at SF Data Mining Meetup

PERSONALIZED MODELING

Statistically Inefficient not enough data per user

Computationally Inefficient many complex models

A Separate Model for Each User?

Page 87: Velox at SF Data Mining Meetup

Input(Song) Rating

Page 88: Velox at SF Data Mining Meetup

Input(Song) Rating

Page 89: Velox at SF Data Mining Meetup

Input(Song) Rating

Split

Page 90: Velox at SF Data Mining Meetup

Rating

Split

Input(Song)

Page 91: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Page 92: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature Model

Page 93: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Page 94: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes Slowly

Page 95: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes SlowlyTrain in Batch!

Page 96: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes SlowlyTrain in Batch!

PersonalizedUser Model

Page 97: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes SlowlyTrain in Batch!

Small Data

PersonalizedUser Model

Page 98: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes SlowlyTrain in Batch!

Small DataChanges Quickly

PersonalizedUser Model

Page 99: Velox at SF Data Mining Meetup

PERSONALIZED SPLIT MODEL

Input(Song)

Shared Basis Feature ModelBig Data

Changes SlowlyTrain in Batch!

Small DataChanges Quickly

Train Online!

PersonalizedUser Model

Page 100: Velox at SF Data Mining Meetup

Input(Song)

PersonalizedUser Model

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 101: Velox at SF Data Mining Meetup

Input(Song)

PersonalizedUser Model

Input(Song)

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 102: Velox at SF Data Mining Meetup

Input(Song)

PersonalizedUser Model

Input(Song)

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 103: Velox at SF Data Mining Meetup

Input(Song)

PersonalizedUser Model

Meow

Input(Song)

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 104: Velox at SF Data Mining Meetup

PersonalizedUser Model

Meow

Input(Song)Input

(Song)

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 105: Velox at SF Data Mining Meetup

PersonalizedUser Model

Meow

Terrible

Input(Song)Input

(Song)

Shared Basis Feature Model

PERSONALIZED SPLIT MODEL

Page 106: Velox at SF Data Mining Meetup

MATHEMATICAL FORMULATION

Input(Song)

Page 107: Velox at SF Data Mining Meetup

MATHEMATICAL FORMULATION

Input(Song)

x

Page 108: Velox at SF Data Mining Meetup

Shared BasisFeature Models

Changes slowly

MATHEMATICAL FORMULATION

Input(Song)

x

Page 109: Velox at SF Data Mining Meetup

Shared BasisFeature Models

Changes slowly

MATHEMATICAL FORMULATION

Input(Song)

f(x; ✓)x

Page 110: Velox at SF Data Mining Meetup

Shared BasisFeature Models

PersonalizedUser Model

Changes slowly

Highly dynamic

MATHEMATICAL FORMULATION

Input(Song)

f(x; ✓)x

Page 111: Velox at SF Data Mining Meetup

Shared BasisFeature Models

PersonalizedUser Model

Changes slowly

Highly dynamic

MATHEMATICAL FORMULATION

Input(Song)

f(x; ✓) ·wu

x

Page 112: Velox at SF Data Mining Meetup

Shared BasisFeature Models

PersonalizedUser Model

Changes slowly

Highly dynamic

= Rating

MATHEMATICAL FORMULATION

Input(Song)

f(x; ✓) ·wu

x

Meow

Page 113: Velox at SF Data Mining Meetup

FEEDBACK API

POST  /velox/catify/observe?userid=22&song=27&score=3.7

Simple direct value feedback:

Page 114: Velox at SF Data Mining Meetup

FEEDBACK API

POST  /velox/catify/observe?userid=22&song=27&score=3.7

Simple direct value feedback:

Continuously update user models in Velox

Online Learning

Page 115: Velox at SF Data Mining Meetup

FEEDBACK API

POST  /velox/catify/observe?userid=22&song=27&score=3.7

Simple direct value feedback:

Continuously update user models in Velox

Online Learning Offline LearningLogged to Tachyon for

feature learning in Spark

Page 116: Velox at SF Data Mining Meetup

FEEDBACK API

POST  /velox/catify/observe?userid=22&song=27&score=3.7

Simple direct value feedback:

Continuously update user models in Velox

Online Learning Offline LearningLogged to Tachyon for

feature learning in Spark

EvaluationContinuously assessmodel performance

Page 117: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Feedback via REST

Model updated

in realtime

uuid: 4

Page 118: Velox at SF Data Mining Meetup

ONLINE LEARNING

velox.jar

user model

def  observe(u:  UUID,  x:  Context,  y:  Score)

Page 119: Velox at SF Data Mining Meetup

ONLINE LEARNING

velox.jar

user model

def  observe(u:  UUID,  x:  Context,  y:  Score)

Update user model with new

training data Write

Page 120: Velox at SF Data Mining Meetup

ONLINE LEARNING

velox.jar

user model

def  observe(u:  UUID,  x:  Context,  y:  Score)

Stochastic gradient descent

Update user model with new

training data Write

Page 121: Velox at SF Data Mining Meetup

ONLINE LEARNING

velox.jar

user model

def  observe(u:  UUID,  x:  Context,  y:  Score)

Stochastic gradient descent

Incremental linear algebra

Update user model with new

training data Write

Page 122: Velox at SF Data Mining Meetup

SYSTEM ARCHITECTURE

master

workerworker

worker

Batch train RPC

frontend.js

uuid: 01-10

uuid: 11-20

uuid: 20-30

Page 123: Velox at SF Data Mining Meetup

OFFLINE OR NEARLINE LEARNING

def  retrain(trainingData:  RDD)

Spark BasedTraining Algs.

wu · f(x; ✓)

Automated retraining policies

Efficient batch training using Spark

Incremental learning using Spark Streaming

Page 124: Velox at SF Data Mining Meetup

Data Model

Page 125: Velox at SF Data Mining Meetup

Data Model

Sample Bias: model affects the training data.

Page 126: Velox at SF Data Mining Meetup

ALWAYS SERVE THE BEST SONG?

Songs

PredictedRating

Page 127: Velox at SF Data Mining Meetup

ALWAYS SERVE THE BEST SONG?

Songs

PredictedRating

Page 128: Velox at SF Data Mining Meetup

VELOX SOLUTION

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted song

Page 129: Velox at SF Data Mining Meetup

VELOX SOLUTION

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted song

Page 130: Velox at SF Data Mining Meetup

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song

VELOX SOLUTION

Page 131: Velox at SF Data Mining Meetup

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song

VELOX SOLUTION

Page 132: Velox at SF Data Mining Meetup

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song

Epsilon Greedy

VELOX SOLUTION

Page 133: Velox at SF Data Mining Meetup

PredictedRating

Songs

With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song

Epsilon Greedy

Active Learning Opportunity to explore new systems for

this emerging analytics workload

VELOX SOLUTION

Page 134: Velox at SF Data Mining Meetup

BEYOND RECOMMENDER SYSTEMS

Page 135: Velox at SF Data Mining Meetup

1. Spam and anomaly detection

BEYOND RECOMMENDER SYSTEMS

Page 136: Velox at SF Data Mining Meetup

1. Spam and anomaly detection

2. Device/location specific modeling

BEYOND RECOMMENDER SYSTEMS

Page 137: Velox at SF Data Mining Meetup

1. Spam and anomaly detection

2. Device/location specific modeling

3. YOUR machine learning application

BEYOND RECOMMENDER SYSTEMS

Page 138: Velox at SF Data Mining Meetup

Spark Streaming Spark

SQL

Graph X ML

library

BlinkDB MLbase VeloxTraining Management + Serving

Spark

HDFS, S3, … Tachyon

ModelManager

PredictionService

THE MISSING PIECE IN BDAS

Mesos

Page 139: Velox at SF Data Mining Meetup

SUMMARY

Page 140: Velox at SF Data Mining Meetup

Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

SUMMARY

Page 141: Velox at SF Data Mining Meetup

Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

The Velox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions

SUMMARY

Page 142: Velox at SF Data Mining Meetup

Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

The Velox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions

Velox will be open-source: coming soon to BDAS

SUMMARY

Page 143: Velox at SF Data Mining Meetup

Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

The Velox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions

Velox will be open-source: coming soon to BDAShttps://amplab.cs.berkeley.edu/projects/velox/

SUMMARY

Page 144: Velox at SF Data Mining Meetup

Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

The Velox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions

Velox will be open-source: coming soon to BDAShttps://amplab.cs.berkeley.edu/projects/velox/[email protected]

SUMMARY

Page 145: Velox at SF Data Mining Meetup

QUESTIONS?