metrics driven devops - automate scalability and performance into your pipeline

Metrics-Driven DevOps„Automating Scalabilty and Performance Checks into your Pipeline“

Andreas Grabner (@grabnerandi) – [email protected]

https://dynatrace.github.io/ufo/

“In Your Face” Data!

Time of D

eployment

Availability dropped to 0%

Metrics-Based Decisions

Update of Dependency Injection Library impacts Memory & CPU

Object Churning impacting GC and eating CPU!Tim

e of Deploym

ent

App with Regular Load supported by

10 ContainersTwice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!

Does it really scale?

Resource Utilization

per Containers

Host Health

Infrastructure, Container, Cloud ... Metrics!!

#1: Where do people come from?

#2: Who are they?

Daily Deployments + Mkt Push

Increase # of unhappy users!

Drop in Conversion Rate

Overall increase of Users!

Satisfied Users Click more Content

Tolerating Users click less content

Frustrated Users mainly click on Support

AND MANY MORE

Vatican, 2005

Vatican, 2013

Vatican, 2005

Vatican, 2013

The Promise of

Confidential, Dynatrace, LLC

Boston Feb 2015!

After #3 of 7 Blizzards!

Total: 2.10m Snow!

IoT: Smart Roofs

AlertBEFOREIt is too late!

What they really want!


700 deployments / YEAR

10 + deployments / DAY

50 – 60 deployments / DAY

Every 11.6 SECONDS

Not only fast delivered but also delivering fast!

-1000ms +2%

Response Time Conversions

-1000ms +10%

+100ms -1%

Why most (will) fail!


It‘s not about blindly giving everyone Ops powerto deploy changes only tested locally

It‘s not about blind automation of pushing more bad code on new stacks through a pipeline

It‘s not about blindly adding new features on topof existing withouth measuring its success

I learning from

others

http://bit.ly/sharepurepath

282! Objects on that page9.68MB Page Size

8.8s Page Load Time

Most objects are images delivered from your main

domain

Very long Connect time (1.8s) to your CDN

„DevOps Deployment“ Example #1: Online Casino

Example #2: Online Sports Club Search Service

2015201420xx

Response Time

2016+

1) Started as a small project

2) Slowly growing user base

3) Expanding to new markets –

1st performance degradation!

4) Adding more markets – performance becomes

a business impact Users

4) Potentially start loosing users

Early 2015: Monolithic App

Can‘t scale vertically endlessly!

2.68s Load Time

94.09% CPU Bound

Proposal: Service approach!

Front Endto Cloud

Scale Backendin Containers!

7:00 a.m.Low Load and Service runningon minimum redundancy

12:00 p.m.Scaled up service during peak loadwith failover of problematic node

7:00 p.m.Scaled down again to lower loadand move to different geo location

Testing the Backend Service alone scales well …

Go live – 7:00 a.m.

Go live – 12:00 p.m.

What Went Wrong?

26.7s Load Time5kB Payload

33! Service Calls

99kB - 3kB for each call!

171! Total SQL Count

Architecture ViolationDirect access to DB from frontend service

Single search query end-to-end

The fixed end-to-end use case“Re-architect” vs. “Migrate” to Service-Orientation

2.5s (vs 26.7) 5kB Payload

1! (vs 33!) Service Call

5kB (vs 99) Payload!

3! (vs 177) Total SQL Count

You measure it! from Dev (to) Ops

Build 17 testNewsAlert OK

testSearch OK

Build # Use Case Stat # API Calls # SQL Payload CPU

1 5 2kb 70ms

1 3 5kb 120ms

Use Case Tests and Monitors Service & App Metrics


testSearch OK


testSearch OK

1 4 1kb 60ms

34 171 104kb 550ms

Ops#ServInst Usage RT

1 0.5% 7.2s

1 63% 5.2s

1 4 1kb 60ms

2 3 10kb 150ms

1 0.6% 4.2s

5 75% 2.5s

Build 35 testNewsAlert -

testSearch OK

- - - -

2 3 10kb 150ms

- - -

8 80% 2.0s

Metrics from and for Dev(to)Ops

Re-architecture into „Services“ + Performance Fixes

Scenario: Monolithic App with 2 Key Features

#1: Don’t Check In Bad Code

Step #1: Execute your Tests just as you always do ...

Step #2: ... but CAPTURE Metrics!!

Step #3: Verify Code works as intended – including your frameworks!

#1: Analyzing every Unit, Integration & REST API test

#2: Key Architectural Metrics for each test

#3: Detecting regression based on measure per Checkin

#2: Stop Bad Builds in CI

#3: Monitor your Services/Users in Prod #1: Usage

Tip: UEM Conversion!#2: Load vs Response

Tip: See unusual spikes

#3: Architectural Metrics DB, Exceptions, Web

Service Calls

#4: Metrics per Service in Ops

# SQLs per Search

# RESTs per SearchSpot bad Deployment?

Payload per Search

#1: Do my campaigns work?

#2: Who are my users?

#5: Understand your End Users

#6: Optimize End User Behavior#1: Are they using the

features we built?

#2: Is there a difference between Premium and

Normal users?

#3: Does Performance have a Behavior Impact?

Dev&Test: Personal License to Stop Bad Code when it

gets created!Tip: Dont leave your IDE!

Continuous Integration: Auto-Stop Bad Builds based on AppMetrics from Unit-, Integration, - Perf Tests

Tip: integrate with Jenkins, Bamboo ...

Prod: Monitor Usage and Runtime Behavior per Service, User Action,

Feature ...Tip: Stream to ELK, Splunk and Co ...

Automated Tests: Identify Non-Functional Problems by looking at App Metrics

Tip: Feed data back into your test tool!

Build & Deliver Apps like the Unicorns!With a Metrics-Driven Pipeline!

12:00 a.m – 11:59 p.m.

QuestionsSlides: slideshare.net/grabnerandiGet Tools: bit.ly/dtpersonalYouTube Tutorials: bit.ly/dttutorialsContact Me: [email protected] Me: @grabnerandiRead More: blog.dynatrace.com

Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com


Unicorns 2.0

Adam Auerbach@bugman31

“All-in Agile: across the pipeline”

“We don’t log bugs, we fix them!”

“Measure Built-Into your Pipeline”

“All manual testers: automate!”

LEARN MORE: READ DYNATRACE BLOG FROM VELOCITY 2015

Technical Debt

Business Debt

Organizational Rust

Nita Awatramani

45% Apps Eliminated

60% DCs Consolidated75% Virtualized

8M$ Ann. Costs Slashed

100% Verizon Agile

LEARN MORE: PERFORM 2015 VIDEO + UPCOMING WEBINAR!

metrics driven devops - automate scalability and performance into your pipeline

Software