![Page 1: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/1.jpg)
http://clipper.aihttps://github.com/ucbrise/clipper
December 8, 2017
A Low-Latency Online Prediction Serving System
Clipper
![Page 2: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/2.jpg)
BigData
Training
Training
Application
Decision
Query
Model
Prediction-Serving for interactive applicationsTimescale: ~10s of milliseconds
Serving
![Page 3: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/3.jpg)
Prediction-Serving Challenges
Support low-latency, high-throughput serving workloads
???Create VWCaffe 3
Large and growing ecosystem of ML models and frameworks
![Page 4: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/4.jpg)
Prediction-Serving Today
Highly specialized systems for specific problems
Decision
QueryX Y
Offline scoring with existing frameworks and systems
Clipper aims to unify these approaches
New class of systems: Prediction-Serving Systems
![Page 5: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/5.jpg)
Clipper
Predict
MC MCRPC RPC RPC RPC
Clipper Decouples Applications and Models
Applications
Model Container (MC) MC
Caffe
RPC/REST Interface
![Page 6: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/6.jpg)
Clipper
CaffeMC MC MC
RPC RPC RPC RPCModel Container (MC)
Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systemsØ Models run in separate processes as Docker containers
Ø Resource isolation: Cutting edge ML frameworks can be buggyØ Scale-out and deployment on Kubernetes
![Page 7: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/7.jpg)
Clipper Architecture
Clipper
Caffe
ApplicationsPredict
MC MC MCRPC RPC RPC RPC
CachingLatency-Aware Batching
Model Container (MC)
![Page 8: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/8.jpg)
Status of the project
Ø First released in May 2017 with a focus on usability
Ø Currently working towards 0.3 release and actively working with early usersØ Focused on performance improvements and better monitoring and stability
Ø Supports native deployments on Kubernetes and a local Docker mode
Ø Goal: Community-owned platform for model deployment and servingØ Post issues and questions on GitHub and subscribe to our mailing list clipper-
https://github.com/ucbrise/clipper
![Page 9: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/9.jpg)
Simplifying Model Deployment with
Clipper
![Page 10: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/10.jpg)
Getting Started with Clipper is Easy
Docker images available on DockerHub
Clipper admin is distributed as pip package:
pip install clipper_admin
Get up and running without cloning or compiling!
![Page 11: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/11.jpg)
MC
Driver Program
SparkContext
Worker Node
Executor
Task Task
Worker Node
Executor
Task Task
Worker Node
Executor
Task Task
Web Server
Database
CacheClipper
Clipper Connects Training and Serving
![Page 12: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/12.jpg)
Problem: Models don’t run in isolation
Must extract model plus pre-and post-processing logic
![Page 13: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/13.jpg)
Clipper provides a library of model deployers
Ø Deployer automatically and intelligently saves all prediction codeØ Captures both framework-specific models and
arbitrary serializable code
Ø Replicates required subset of training environment and loads prediction code in a Clipper model container
![Page 14: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/14.jpg)
Clipper provides a (growing) library of model deployers
ØPythonØCombine framework specific models with external featurization,
post-processing, business logicØCurrently support Scikit-Learn, PySpark, TensorFlowØPyTorch, Caffe2, XGBoost coming soon
ØScala and Java with Spark:Øboth MLLib and Pipelines APIs
ØArbitrary R functions
![Page 15: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/15.jpg)
Ongoing Research
![Page 16: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/16.jpg)
Supporting Modular Multi-Model Pipelines
Ensembles can improve accuracy
Faster inference with prediction
cascades
Fast model
If confident then return
Slow but accurate
model
Faster development through model-
reuse
Pre-trained DNN
Task-specific
model
Model specialization
Object detector
If object detected
If face detected
Else
Face detector
How to efficiently support serving arbitrary model pipelines?
![Page 17: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/17.jpg)
Challenges of Serving Model Pipelines
Ø Complex tradeoff space of latency, throughput, and monetary costØ Many serving workloads are interactive and highly latency-sensitiveØ Performance and cost depend on model, workload, and physical resources
available
Ø Model composition leads to combinatorial explosion in the size of the tradeoff spaceØ Developers must make decisions about how to configure individual models
while reasoning about end-to-end pipeline performance
![Page 18: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/18.jpg)
Solution: Workload-Aware OptimizerØ Exploit structure and properties of inference computation
Ø Immutable stateØ Query-level parallelismØ Compute-intensive
Ø Pipeline definitionØ Intermingle arbitrary application code and Clipper-hosted model evaluation for
maximum flexibility
Ø Optimizer inputØ Pipeline, sample workload, and performance or cost constraints
Ø Optimizer outputØ Optimal pipeline configuration that meets constraints
Ø Deployed models use Clipper as physical execution engine for serving
![Page 19: A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training](https://reader033.vdocuments.site/reader033/viewer/2022042214/5eb98cadcac1f304fd731dc4/html5/thumbnails/19.jpg)
ConclusionØ Challenges of serving increasingly complex models trained in
variety of frameworks while meeting strict performance demands
Ø Clipper adopts a container-based architecture and employs prediction caching and latency-aware batching
Ø Clipper’s model deployer library makes it easy to deploy both framework-specific models and arbitrary processing code
Ø Ongoing efforts on a workload-aware optimizer to optimize the deployment of complex, multi-model pipelines
http://clipper.ai