#b20Con
I TSM DEVOPS CONFERENCE
Metrics-Driven DevOps
Andreas Grabner(@grabnerandi) - [email protected]
“Delivering Software like the Unicorns”
#b20Con
AND MANY MORE
#b20Con
#b20Con
Vatican, 2005
#b20Con
Vatican, 2005
Vatican, 2013
#b20Con
Vatican, 2005Vatican,
2005
Vatican, 2013
#b20Con
Vatican, 2005
#b20Con
Vatican, 2005
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
SMART APPLIANCES
#b20Con
SMART ROOVES
AlertBEFOREIt is too late!
#b20Con
#b20Con
#b20Con
#b20Con
#b20Con
The stuff we did when we were a Start
Up and we All were
Devs, Testers and Ops
#b20Con
#b20Con
“IN YOUR FACE” DATA!
https://dynatrace.github.io/ufo/
#b20Con
METRICS-BASED DECISIONS
Time o
f Dep
loymen
t
Availability dropped to 0%
#b20Con
#b20Con
#b20Con
Analyzing SQLs by Type (SELECT, INSERT, UPDATE,DELETE)
Which Types take how long? When do you have spikes?
When do we see # of SQL spikes? Spikes related to INSERT, UPDATE, DELETE? Any job
running?
#b20Con
#b20Con
#b20Con
#b20Con
700 deployments / YEAR
10 + deployments / DAY
50 – 60 deployments / DAY
Every 11.6 SECONDS
#b20Con
NOT ONLY FAST DELIVERED, BUT ALSO DELIVERING FAST!
-1000ms +2%
Response Time Conversions
-1000ms +10%
+100ms -1%
#b20Con
60% Rate performance/response time as
the #1 mobile app expectation- ahead of features and functionality -
#b20Con
#b20Con
It‘s not about blindly giving everyone Ops powerto deploy changes only tested locally
#b20Con
It‘s not about blind automation of pushing more bad code on new stacks through a pipeline
#b20Con
#b20Con
#b20Con
#b20Con
282! Objects on that page9.68MB
Page Size
8.8s Page Load Time
Most objects are images delivered from
your main domain
Very long Connect time (1.8s) to your CDN
DevOps Deployment Example 1: Online Casino
#b20Con
#b20Con
2015201420xx
Response Time
2016+
1) Started as a small project
2) Slowly growing user base
3) Expanding to new markets –
1st performance degradation!
4) Adding more markets – performance becomes
a business impact Users
4) Potentially start loosing
users
Online Sports Club Search Service
#b20Con
EARLY 2015: MONOLITHIC APP
Can‘t scale vertically endlessly!
2.68s Load Time
94.09% CPU Bound
#b20Con
PROPOSAL: SERVICE APPROACH!
Front Endto Cloud
Scale Backendin Containers!
#b20Con
TESTING THE BACKEND SERVICE ALONE SCALES WELL
7:00 a.m.Low Load and Service running on minimum redundancy
12:00 p.m.Scaled up service during peak loadwith failover of problematic node
7:00 p.m.Scaled down again to lower load and move to different geo location
#b20Con
GO-LIVE: 7:00AM
#b20Con
GO-LIVE: 12:00PM
#b20Con
WHAT WENT WRONG?
#b20Con
SINGLE SEARCH QUERY END-TO-END
26.7s Load Time 5kB Payload
33! Service Calls99kB - 3kB for each
call!
171! Total SQL Count
Architecture ViolationDirect access to DB from frontend service
#b20Con
THE FIXED END-TO-END USE CASE“Re-Architect” vs. “Migrate” to Service-Orientation
2.5s (vs 26.7) 5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3! (vs 177) Total SQL Count
#b20Con
#b20Con
YOU MEASURE IT! FROM DEV TO OPS
#b20Con
METRICS FROM & FOR DEV (TO) OPS
Build 17 testNewsAlert OKtestSearch OK
Build # Use Case Stat # API Calls # SQL Payload CPU
1 5 2kb 70ms1 3 5kb 120ms
Use Case Tests and Monitors Service & App Metrics
Build 26 testNewsAlert OKtestSearch OK
Build 25 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
34 171 104kb 550ms
Ops#ServInst Usage RT
1 0.5% 7.2s
1 63% 5.2s
1 4 1kb 60ms2 3 10kb 150ms
1 0.6% 4.2s
5 75% 2.5s
Build 35 testNewsAlert -
testSearch OK
- - - -
2 3 10kb 150ms
- - -8 80% 2.0
s
Re-architecture into „Services“ + Performance Fixes
Scenario: Monolithic App with 2 Key Features
#b20Con
#1: Analyzing every Unit, Integration &
REST API test
#2: Key Architectural
Metrics for each test
#3: Detecting regression based on measure per
Check-in
#1: STOP BAD BUILDS IN CI
#b20Con
#2: MONITOR YOUR SERVICES/USERS IN PROD#1: Usage
Tip: UEM Conversion!
#2: Load vs ResponseTip: See unusual spikes
#3: Architectural Metrics
DB, Exceptions, Web Service Calls
#b20Con
#3: METRICS PER SERVICE IN OPS
# SQLs per Search
# RESTs per SearchSpot bad Deployment?
Payload per Search
#b20Con
#4: UX ANALYSIS BASED ON UEM DATA #1: Are they using the
features we built?
#2: Is there a difference between
Premium and Normal users?
#3: Does Performance have a
Behavior Impact?
#b20Con
BUILD & DELIVER APPS THAT CAN EAT THE WORLD!WITH A METRICS-DRIVEN PIPELINE!
#b20Con
12AM - 11:59PM
#b20Con
#3 You Automate/Virtualize
#2 You Measure Dev(to)Ops
#1 You Build It, You Run It!
#4 You API vs You App
#5 Apply for Unicorn Status
#b20Con
Questions•Slides: slideshare.net/grabnerandi
•Get Tools: bit.ly/dtpersonal•YouTube Tutorials: bit.ly/dttutorials
•Contact Me: [email protected]
•Follow Me: @grabnerandi•Read More: blog.dynatrace.com
#b20Con
QUESTIONS?
Slides: slideshare.net/grabnerandiGet Tools: bit.ly/dtpersonalYouTube Tutorials: bit.ly/dttutorialsContact Me: [email protected] Me: @grabnerandiRead More: blog.dynatrace.com
Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com