metrics driven devops - automate scalability and performance into your pipeline
TRANSCRIPT
Metrics-Driven DevOps„Automating Scalabilty and Performance Checks into your Pipeline“
Andreas Grabner (@grabnerandi) – [email protected]
https://dynatrace.github.io/ufo/
“In Your Face” Data!
Time of D
eployment
Availability dropped to 0%
Metrics-Based Decisions
Update of Dependency Injection Library impacts Memory & CPU
Object Churning impacting GC and eating CPU!Tim
e of Deploym
ent
App with Regular Load supported by
10 ContainersTwice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!
Does it really scale?
Resource Utilization
per Containers
Host Health
Infrastructure, Container, Cloud ... Metrics!!
#1: Where do people come from?
#2: Who are they?
Daily Deployments + Mkt Push
Increase # of unhappy users!
Drop in Conversion Rate
Overall increase of Users!
Satisfied Users Click more Content
Tolerating Users click less content
Frustrated Users mainly click on Support
AND MANY MORE
Vatican, 2005
Vatican, 2013
Vatican, 2005
Vatican, 2013
The Promise of
Confidential, Dynatrace, LLC
Boston Feb 2015!
After #3 of 7 Blizzards!
Total: 2.10m Snow!
IoT: Smart Roofs
AlertBEFOREIt is too late!
What they really want!
Confidential, Dynatrace, LLC
700 deployments / YEAR
10 + deployments / DAY
50 – 60 deployments / DAY
Every 11.6 SECONDS
Not only fast delivered but also delivering fast!
-1000ms +2%
Response Time Conversions
-1000ms +10%
+100ms -1%
Why most (will) fail!
Confidential, Dynatrace, LLC
It‘s not about blindly giving everyone Ops powerto deploy changes only tested locally
It‘s not about blind automation of pushing more bad code on new stacks through a pipeline
It‘s not about blindly adding new features on topof existing withouth measuring its success
I learning from
others
http://bit.ly/sharepurepath
282! Objects on that page9.68MB Page Size
8.8s Page Load Time
Most objects are images delivered from your main
domain
Very long Connect time (1.8s) to your CDN
„DevOps Deployment“ Example #1: Online Casino
Example #2: Online Sports Club Search Service
2015201420xx
Response Time
2016+
1) Started as a small project
2) Slowly growing user base
3) Expanding to new markets –
1st performance degradation!
4) Adding more markets – performance becomes
a business impact Users
4) Potentially start loosing users
Early 2015: Monolithic App
Can‘t scale vertically endlessly!
2.68s Load Time
94.09% CPU Bound
Proposal: Service approach!
Front Endto Cloud
Scale Backendin Containers!
7:00 a.m.Low Load and Service runningon minimum redundancy
12:00 p.m.Scaled up service during peak loadwith failover of problematic node
7:00 p.m.Scaled down again to lower loadand move to different geo location
Testing the Backend Service alone scales well …
Go live – 7:00 a.m.
Go live – 12:00 p.m.
What Went Wrong?
26.7s Load Time5kB Payload
33! Service Calls
99kB - 3kB for each call!
171! Total SQL Count
Architecture ViolationDirect access to DB from frontend service
Single search query end-to-end
The fixed end-to-end use case“Re-architect” vs. “Migrate” to Service-Orientation
2.5s (vs 26.7) 5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3! (vs 177) Total SQL Count
You measure it! from Dev (to) Ops
Build 17 testNewsAlert OK
testSearch OK
Build # Use Case Stat # API Calls # SQL Payload CPU
1 5 2kb 70ms
1 3 5kb 120ms
Use Case Tests and Monitors Service & App Metrics
Build 26 testNewsAlert OK
testSearch OK
Build 25 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
34 171 104kb 550ms
Ops#ServInst Usage RT
1 0.5% 7.2s
1 63% 5.2s
1 4 1kb 60ms
2 3 10kb 150ms
1 0.6% 4.2s
5 75% 2.5s
Build 35 testNewsAlert -
testSearch OK
- - - -
2 3 10kb 150ms
- - -
8 80% 2.0s
Metrics from and for Dev(to)Ops
Re-architecture into „Services“ + Performance Fixes
Scenario: Monolithic App with 2 Key Features
#1: Don’t Check In Bad Code
Step #1: Execute your Tests just as you always do ...
Step #2: ... but CAPTURE Metrics!!
Step #3: Verify Code works as intended – including your frameworks!
#1: Analyzing every Unit, Integration & REST API test
#2: Key Architectural Metrics for each test
#3: Detecting regression based on measure per Checkin
#2: Stop Bad Builds in CI
#3: Monitor your Services/Users in Prod #1: Usage
Tip: UEM Conversion!#2: Load vs Response
Tip: See unusual spikes
#3: Architectural Metrics DB, Exceptions, Web
Service Calls
#4: Metrics per Service in Ops
# SQLs per Search
# RESTs per SearchSpot bad Deployment?
Payload per Search
#1: Do my campaigns work?
#2: Who are my users?
#5: Understand your End Users
#6: Optimize End User Behavior#1: Are they using the
features we built?
#2: Is there a difference between Premium and
Normal users?
#3: Does Performance have a Behavior Impact?
Dev&Test: Personal License to Stop Bad Code when it
gets created!Tip: Dont leave your IDE!
Continuous Integration: Auto-Stop Bad Builds based on AppMetrics from Unit-, Integration, - Perf Tests
Tip: integrate with Jenkins, Bamboo ...
Prod: Monitor Usage and Runtime Behavior per Service, User Action,
Feature ...Tip: Stream to ELK, Splunk and Co ...
Automated Tests: Identify Non-Functional Problems by looking at App Metrics
Tip: Feed data back into your test tool!
Build & Deliver Apps like the Unicorns!With a Metrics-Driven Pipeline!
12:00 a.m – 11:59 p.m.
QuestionsSlides: slideshare.net/grabnerandiGet Tools: bit.ly/dtpersonalYouTube Tutorials: bit.ly/dttutorialsContact Me: [email protected] Me: @grabnerandiRead More: blog.dynatrace.com
Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com
Confidential, Dynatrace, LLC
Unicorns 2.0
Adam Auerbach@bugman31
“All-in Agile: across the pipeline”
“We don’t log bugs, we fix them!”
“Measure Built-Into your Pipeline”
“All manual testers: automate!”
LEARN MORE: READ DYNATRACE BLOG FROM VELOCITY 2015
Technical Debt
Business Debt
Organizational Rust
Nita Awatramani
45% Apps Eliminated
60% DCs Consolidated75% Virtualized
8M$ Ann. Costs Slashed
100% Verizon Agile
LEARN MORE: PERFORM 2015 VIDEO + UPCOMING WEBINAR!