machine learning model life cycle management in production
TRANSCRIPT
![Page 1: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/1.jpg)
[email protected]@datatron.com
Machine Learning Model Life Cycle Management
In Production - #aiops #mlops
![Page 2: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/2.jpg)
datatron
Who are we?
�2
Harish DoddiCEO
• Lyft Surge Pricing Model• Twitter’s Distributed Photo
Storage - 12 PB• Architected Snapchat Stories -
scaled from 0 to billions
Jerry XuCTO
• Lyft ETA Machine Learning Model - replaced Google
• Twitter’s Manhattan - replaced Cassandra
• Founding team of Windows Azure
Team: Previously worked at places like Amazon, Microsoft, and AnacondaHeadquarters: 350 Townsend Street, Suite 204, San Francisco, CA
![Page 3: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/3.jpg)
datatron
Today’s Enterprises Data Science Lifecycle
�3
Development
Data Science
Initial AnalysisModel development using multiple frameworksTraining and Experimenting
Production
Engineering DevOps
DeploymentWorkflow processRoll back strategyA/B Testing
Post-Production
DevOps
Model Performance MonitoringInfrastructure MonitoringScalingModel Governance
![Page 4: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/4.jpg)
datatron�4
Production and Post-Production
Deployment, Monitoring & Governance
Data Aggregation
Discovery & Analysis
Training & Experimenting
![Page 5: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/5.jpg)
datatron
Lesson 1
Need for single centralized production teamand platform
�5
![Page 6: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/6.jpg)
datatron�6
Fraud Data Science
MarketingData Science
Credit Risk Data Science
Production infrastructure
Production infrastructure
Production infrastructure
Financial Institutions Production Environment
![Page 7: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/7.jpg)
datatron
Hidden Technical Debt in Machine Learning Systems
�7
Google PaperHidden Technical Debt in Machine Learning Systems
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. … it
is remarkably easy to incur massive ongoing maintenance costs at the system level when
applying machine learning.
Boundary erosionEntanglementHidden feedback loopsUndeclared consumersData dependenciesChanges in the external worldSystem-level anti-patterns
![Page 8: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/8.jpg)
datatron�8
Fraud Data Science
Horizontal DEVOPS AI team
MarketingData Science
Credit Risk Data Science
Production Model Deployment
Centralized Production AI Platform like Datatron
![Page 9: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/9.jpg)
datatron
Lesson 2
Start adopting best practices and standardization early
�9
![Page 10: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/10.jpg)
datatron�10
Old Way
Data Science
End User
Machine Learning
Engineering DevOps
• Teams operate in silos, don’t speak the same language• Errors due to lack of communication• Engineering has to write stand-alone scripts
Production Model
Teams face cross-functional
inefficiencies
![Page 11: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/11.jpg)
datatron
Central devops platform for AI
models streamlines the process and
reduces the inefficiencies
�11
Data Science
End User
platformdatatron
DevOps
Production Model
New Way
![Page 12: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/12.jpg)
datatron
Lesson 3
Data Science Universe shouldn’t be limited
�12
![Page 13: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/13.jpg)
datatron�13
Model Containerization
Data Scientist Universe is un-limited
Team
Upcoming Frameworks
New Way
Team Team
Model
TensorFlowModel
scikit-learnModel
Apache Spark
No Model Containerization
Data Scientist Universe is limited
Old Way
TeamTeam Team
Model
![Page 14: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/14.jpg)
datatron�14
datatron
Framework Agnostic
LibraryAgnostic
LanguageAgnostic
Infrastructure Agnostic
SageMaker
![Page 15: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/15.jpg)
datatron
Lesson 4
Models may go wrong, you need to monitor them continuously
�15
![Page 16: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/16.jpg)
datatron�16
South Park and Alexa
![Page 17: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/17.jpg)
datatron�17
RecommendationModel
Features
Business is continuously losing value
Garbage
Old Way
RecommendationModel
Features
Recommendation
Minimize the continuous loss
Alert!
New Way
Alert!
![Page 18: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/18.jpg)
datatron�18
New Way
# o
f Fra
ud
Tra
nsa
ctio
ns
Jan Feb Mar Apr May Jun
Reality
Model Prediction
Anomalies Detected
Production Model Monitoring
Allows You To Act Preemptively
Old Way
TOOLATE
# o
f Fra
ud
Tra
nsa
ctio
ns
Post-mortem reports
![Page 19: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/19.jpg)
datatron
Model Performance monitoring• Confusion Matrix• Gain and Lift charts• Kolomogorov Smirnov chart• Area Under the ROC curve• Gini Coefficient• Concordant – Discordant ratio• Root Mean Squared Error (RMSE)• etcModel Timeout monitoringInfrastructure monitoringOrganization KPI monitoringAnomoly monitoring
�19
Monitoring for Machine Learning Models
![Page 20: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/20.jpg)
datatron
Lesson 5
You either NEVER deploy a model, or you have to do it over and over again
�20
![Page 21: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/21.jpg)
datatron
ML model is a continuously optimizing process
�21
Concept driftNew concept comes up
Model Building
Model Deployment
Model Monitoring
Model Management
![Page 22: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/22.jpg)
datatron
Connecting Machine Learning to Software world
�22
Before Now
Software deployment once a 1 or 2 years
Software deployment every day
Future
Machine Learning models will deploy
very frequent and fast
Machine Learning models deploy
very slow
BUT
![Page 23: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/23.jpg)
datatron
Lesson 6
Model Governance should be automated as much as possible
�23
![Page 24: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/24.jpg)
datatron�24
Log
Comments
Versions
Model Versioning
New Way
Production Model Governance
Each Model Action Is Logged
Old Way
Comments
Version Log
LogLog
Log
Time for an audit
![Page 25: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/25.jpg)
datatron
Lesson 7
Be prepared, your number of model will grow
�25
![Page 26: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/26.jpg)
datatron�26
Deployment Learning: 1 model vs Multiple models
![Page 27: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/27.jpg)
datatron
Value Proposition: Cost per order of magnitude
�27
Without Model Management
# of models
With Model Management
As the number of models increases, the cost also increases
As the number of models increases, the cost significantly decreases
# of models
Cost per model
Cost per model
![Page 28: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/28.jpg)
datatron
Lesson 8
Senior people are required at later stages
�28
![Page 29: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/29.jpg)
datatron
Software Development vs ML Model Development
�29
EvolutionTestingImplementationDesignRequirements
Monitoring and
Optimization
Deploy to Production
Training / Testing
Data Preparation
Requirements
Senior People
Senior People
![Page 30: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/30.jpg)
datatron
Lesson 9
Your real work starts AFTER you deploy the model to production
�30
![Page 31: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/31.jpg)
datatron
Enterprise AI Life Cycle
�31
Exploration Training Deploy
![Page 32: Machine Learning Model Life Cycle Management In Production](https://reader031.vdocuments.site/reader031/viewer/2022013015/61cfd3a654361c745073b653/html5/thumbnails/32.jpg)
datatron
Model performance Model latency Infrastructure monitoring
Feature distribution Model result
Model routing Challenger KPI based selection
Model timeout Fall back strategy Alerting
Split traffic Shadowing Multi-Armed Bandit Optimization
Blue-Green deployment Rollback Canary
Enterprise AI Life Cycle After Deployment
�32
Deploy A/B Testing SLA
Model Selection
Anomaly Detection
Monitoring
Replay End-to-end tracing Logging
Troubleshooting
Cluster management integration Security check
IT Env Integration
Versioning History Approval process
Auditing
Tensorflow, sklearn, H2O, SAS, SparkML etc.
ML Frameworks
Support