multi runtime serving pipelines for machine learning

Multi-Runtime Serving Pipelines

Stepan Pushkarev CTO of Hydrosphere.io

Mission: Accelerate Machine Learning to Production

Opensource Products:- Mist: Serverless proxy for Spark- ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring

Business Model: Subscription services and hands-on consulting

About

Deployment | Serving | Scoring | Inference

@Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/

From Single Model to Meta Pipelines

Item 1 Item 2

Title Authentic HERMES Bijouterie Fantaisie Selle Clip-On Earrings Silvertone #S1742 E

Auth HERMES Earrings Sellier Clip-on Silver Tone Round $0 Ship 25130490900 S06B

Specs Brand: HERMESSize(cm): W1.8 x H1.8 cm(Approx)Color: SilverSize(inch): W0.7 x H0.7" (Approx)Style: EarringsRank: B

Brand: HermesFastening: Clip-OnStyle: Clip onCountry/Region of Manufacture: UnknownMetal: Silver PlatedMain Color:SilverColor: Silver

Description ... ...

Does this pair describe the same thing?

Product Matching

Model Artifact: Ops perspective

- HTTP/1.1, HTTP/2, gRPC

- Kafka, Flink, Kinesis

- Protobuf, Avro

- Service Discovery

- Pipelining

- Tracing

- Monitoring

- Autoscaling

- Versioning

- A/B, Canary

- Testing

- CPU, GPU

API & Logistics

Monitoring

Shifting experimentation to production

Sidecar Architecture

Functions registry responsible for the model life cycle and all the business logic required to configure models for serving

Mesh of serving runtimes is an actual serving cluster

Infrastructure integration: ECS for AWS, Kubernetes for GCE and on premise

UX: Models and Applications

Applications provide public virtual endpoints for the

models and compositions of the models.

Why Not just one Big Neural Network?

● Not always possible

● Stages could be independent

● Ad-hoc rule based models

● Physics models (e.g. LIDAR)

● Big E2E DL Requires Black

Magic skills

Why Not just one Python script?

● Modularity. Stages could be developed by different teams

● Traceability and Monitoring

● Versioning

● Independent deployment, A/B testing and Canary

● Request Shadowing and other cool stuff

● Could require different ML runtimes (TF, Scikit, Spark

ML, etc)

● We need more microservices :)

Why Not just TF Serving? ● Other ML runtimes (DL4J, Scikit,

Spark ML). Servables are overkill.

● Need better versioning and

immutability (Docker per version)

● Don’t want to deal with state

(model loaded, offloaded, etc)

● Want to re-use microservices stack

(tracing, logging, metrics)

● Need better scalability

Thank you

- @hydrospheredata

- https://github.com/Hydrospheredata

- https://hydrosphere.io/

- [email protected]

https://github.com/Hydrospheredata/mist

http://hydrosphere.io/

mailto:[email protected]

multi runtime serving pipelines for machine learning

Software