multi runtime serving pipelines for machine learning
TRANSCRIPT
Multi-Runtime Serving Pipelines
Stepan Pushkarev CTO of Hydrosphere.io
Mission: Accelerate Machine Learning to Production
Opensource Products:- Mist: Serverless proxy for Spark- ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring
Business Model: Subscription services and hands-on consulting
About
Deployment | Serving | Scoring | Inference
@Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/
From Single Model to Meta Pipelines
Item 1 Item 2
Title Authentic HERMES Bijouterie Fantaisie Selle Clip-On Earrings Silvertone #S1742 E
Auth HERMES Earrings Sellier Clip-on Silver Tone Round $0 Ship 25130490900 S06B
Specs Brand: HERMESSize(cm): W1.8 x H1.8 cm(Approx)Color: SilverSize(inch): W0.7 x H0.7" (Approx)Style: EarringsRank: B
Brand: HermesFastening: Clip-OnStyle: Clip onCountry/Region of Manufacture: UnknownMetal: Silver PlatedMain Color:SilverColor: Silver
Description ... ...
Does this pair describe the same thing?
Product Matching
Model Artifact: Ops perspective
- HTTP/1.1, HTTP/2, gRPC
- Kafka, Flink, Kinesis
- Protobuf, Avro
- Service Discovery
- Pipelining
- Tracing
- Monitoring
- Autoscaling
- Versioning
- A/B, Canary
- Testing
- CPU, GPU
API & Logistics
Monitoring
Shifting experimentation to production
Sidecar Architecture
Functions registry responsible for the model life cycle and all the business logic required to configure models for serving
Mesh of serving runtimes is an actual serving cluster
Infrastructure integration: ECS for AWS, Kubernetes for GCE and on premise
UX: Models and Applications
Applications provide public virtual endpoints for the
models and compositions of the models.
Why Not just one Big Neural Network?
● Not always possible
● Stages could be independent
● Ad-hoc rule based models
● Physics models (e.g. LIDAR)
● Big E2E DL Requires Black
Magic skills
Why Not just one Python script?
● Modularity. Stages could be developed by different teams
● Traceability and Monitoring
● Versioning
● Independent deployment, A/B testing and Canary
● Request Shadowing and other cool stuff
● Could require different ML runtimes (TF, Scikit, Spark
ML, etc)
● We need more microservices :)
Why Not just TF Serving? ● Other ML runtimes (DL4J, Scikit,
Spark ML). Servables are overkill.
● Need better versioning and
immutability (Docker per version)
● Don’t want to deal with state
(model loaded, offloaded, etc)
● Want to re-use microservices stack
(tracing, logging, metrics)
● Need better scalability
Demo
Thank you
- @hydrospheredata
- https://github.com/Hydrospheredata
- https://hydrosphere.io/