![Page 1: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/1.jpg)
October 2016
First 90SLA vs. AgileMicroservices and cloud monitoring
![Page 2: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/2.jpg)
Why this talk?
![Page 3: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/3.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Agenda1 . “Old World”: MercadoLivre’s original architecture.
2 . “Ground Zero”: shifting to microservices on the cloud
3 . Monitoring the cloud
4. Alarms: when things go south
5. “Fury”: streamlining DevOps at MercadoLivre
![Page 4: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/4.jpg)
In numbers
+400 deploys/dayOn +650 APPS
+1000 DevelopersIn 8 development centers
+10 programming languages
![Page 5: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/5.jpg)
In numbers
+25.000.000Request per minute
+22.000 VM’sIn 7 data centers
+700 DB’sIn 4 different engines
![Page 6: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/6.jpg)
OldWorld
![Page 7: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/7.jpg)
Old world architecture
User ml.jarHuge DB
![Page 8: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/8.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Old world properties
● Monolithic
● Highly coupled code
● Unified SVN repository
● Single DB
● Simple infrastructure with little overhead
● Single QA team
● Closed system
![Page 9: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/9.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
![Page 10: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/10.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
![Page 11: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/11.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
Some people, once a week
![Page 12: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/12.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
Some people, once a week
Only by all experts together, at 3 AM, on thursdays not covered by any “freeze”
![Page 13: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/13.jpg)
GroundZero
![Page 14: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/14.jpg)
Shifting to microservices
Frontend
API
Frontend CRMMobile apps
3rd party devsAPI API
![Page 15: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/15.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Ground zero properties
● Multiple technologies and frameworks (dev’s choice)
● Completely decoupled code in multiple Github repositories
● One DB for each app, multiple engines
● Complex infrastructure with possible high overhead
● QA, testing and Continuous Integrations is done by each team
● Independent deployments, environments and policies
● Open platform
![Page 16: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/16.jpg)
“With great power comes great responsibility”.
Stan Lee
![Page 17: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/17.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Developer responsibilities● Developer gets ownership of entire dev cycle
● Massive empowerment of dev team -> OWNERSHIP
Manage resourcesVMs
Choose support systems required and create them
DevelopCodeChoose your technology and keep your Github repository
Test
Create tests, regressions or CI as needed
Ensure qualityDefine uptime
Define what “up” means for your own app (health.sh)
Measure
Create metrics to analyze performance and downtime
DBs and services
NetworkingCreate rules and loadbalancers to route traffic to application
Create & scale computing pools for dev/test/prod
React
Deploy
Write all routines for automatically deploying your app on any VM React to critical events
that affect your app
![Page 18: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/18.jpg)
DevTools in ML
Developer
Melicloud API
- Create apps- Manage pools (test/prod)- Manage VMs & loadbalancers- Build & deploy
- Create queues- Create DBaaS or KVSaaS- Create caches
Github repo- Code app- Write test & deploy strategy- Write uptime definitions
Nginx
eventRouting & OpsGenie
- Write rules to route traffic to your pools
- Write rules to manage alarms- Define alarm escalation policies & schedules- Manage contact channels
![Page 19: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/19.jpg)
Microservices in ML
![Page 20: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/20.jpg)
Mobile apps
Module
Test app
CI
Main appAutomated build & store deployment
Repo
Team
Module
Test app
CI
Repo
Team
Module
Test app
CI
Repo
Team
![Page 21: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/21.jpg)
Monitoring mobile apps
Module
Main app
Team
Module
Module
Crash reporting
Team
Team
![Page 22: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/22.jpg)
Monitoring the cloud
![Page 23: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/23.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
New Relic● Default monitoring in VMs golden image
● No configuration necessary (initially)
HTTP errorsUnhandled errors
See if other devs/clients misuse your entry params
Stack tracesFast debugging
See what’s going on in production
Unified pool data
All instances’ traces in the same place
Performance metricsTransaction traces
See what’s taking so long
Recognize deviations
Graphs to see if traffic or response time vary w/ respect to another period
Unsupported params
Other services
Detect down services affecting you
Unexpected issues appear in production
Apdex Score
![Page 24: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/24.jpg)
![Page 25: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/25.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Datadog● Easy to use for different frameworks
● Good for business specific metrics
Custom metricsComplex metrics
Graphs filtered with different dimensions
Infra monitoringFull info
More data than NR on disk, memory, network
Scalable
Handles well aggregating information from many different VMs
Real time analysisFast response
Almost no latency
Dashboards
Customizable dashboards to show what’s more relevant for each app
Online filtering
Alarms
Flexible alarms based on custom metrics
You can send multiple parameters for events
![Page 26: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/26.jpg)
![Page 27: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/27.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Log collection
● Logs are collected by an agent on all VMs
● They are sent to an ElasticSearch
● Access via a Kibana frontend
● Developers can use special syntax to create queryable
dimensions for all logged events
● All instances’ logs in the same place
● Request tracing through multiple applications/APIs
(request_id)
![Page 28: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/28.jpg)
![Page 29: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/29.jpg)
Alarms
![Page 30: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/30.jpg)
Unified handling of events
health.sh
Code triggered alarms
eventRouting
![Page 31: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/31.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Event routing
● Rules added by each team
● Check alarm origin, type and importance
● Check “quiet hours”
● Assign escalation policy and forward to OpsGenie
![Page 32: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/32.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
OpsGenie
● Manage teams to deal with escalation policies
● Set “on call” schedules (w/substitutes & manager escalation)
● Everyone manages his contact methods (SMS, mail, phone call, app)
![Page 33: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/33.jpg)
Fury
![Page 34: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/34.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Evolution
Old world Ground zero Fury
![Page 35: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/35.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Fury: DevOps to NoOps
● Still microservices
● Full service oriented
● Easier dev cycle and learning curve
● Pre-assembled flavors for popular frameworks
● Less bash scripts, more UI based configuration
● Auto-scaling & auto-healing
● Docker based (smaller dev/prod environment gap)
● Designed to run on AWS
● Continuous integration already included
![Page 36: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/36.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Fury dashboard
![Page 37: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/37.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: create app
● Creates repository
● Creates Jenkins CI server
● Creates network infra
![Page 38: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/38.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: create scope
● Creates load balancer (ELB)
● Creates auto scaling group (ASG) for scope instances
● Creates instances
● Initialize logs & metrics services
● Download containers to instances
● Start traffic
![Page 39: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/39.jpg)
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: deploy
● Creates ASG for new version
● Create instances for new ASG
● Initialize logs & metrics services
● Download containers to instances
● Progressive traffic switch
● If candidate is OK, destroy
previous infrastructure
![Page 40: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/40.jpg)
![Page 41: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/41.jpg)
?
![Page 42: InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud](https://reader030.vdocuments.site/reader030/viewer/2022020301/587080441a28ab57368b633b/html5/thumbnails/42.jpg)
Thankyou!