metrics driven development 10.09.2014

Post on 28-Nov-2014

795 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation about Metrics Driven Development at DevOps-Finland meeting in Helsinki 10.09.2014.

TRANSCRIPT

Metrics Driven Development How to improve visibility, communication and feedback loop through metrics Erno Aapa @ernoaapa Developer, DevOps-consultant erno.aapa@avaus.fi DevOps-Finland meetup Helsinki 10.09.2014

ERNO AAPA Developer / Team Leader DevOps-consultant Founder of DevOps-Finland

Daily work as a developer. One of Avaus tech team leaders and consult companies about DevOps.

On free time organize DevOps-Finland meetings.

Who am I?

11/09/14 2

Measurement is one of the keys Everyone emphasize the importance of it

Three Ways: Feedback

Feedback loop

11/09/14 4

M in C.A.L.M.S

Measure everything Show the

improvement

11/09/14 5

Build-MEASURE-Learn

Measure and learn

11/09/14 6

… but in reality… we need it, but we don’t have it

What we have

11/09/14 8

Monthly / Yearly reports

Financial reports

Google Analytics

Business

?

Dev

CPU, Mem, IO, Disk

Zabbix/Ganglia/Nagios

Ops

Test coverage reports

CI server notifications

Performance test results

QA

What that cause? •  Bad visibility

Ops-team tries to get the information from logs, load metrics, etc. Business doesn’t have detailed visibility only high level reports

•  Dev “just do my job” They don’t feel the “heartbeat” of the service

•  Invalid conception of service state “No-one complains so it works… I guess :/ ”

•  “Business bug” – feature created but not used Service full of unused legacy features

11/09/14 9

“…is a practice where metrics are used to drive the entire application

development”

Metrics Driven Development (MDD)

InfoQ / 2012

Principles Define metrics before implementation

•  Like in TDD implement test first, implement metric first •  For example: Include metrics to user stories

Instrumentation-as-Code •  Developers must be able to add new metric with minimal(=one line) effort

Single Source Of Truth •  Store common metrics from app, logs, monitoring agents and other tools to single place •  Platform should be so timely, comprehensive, and intuitive to use that everyone instinctively relies on it

Shared view for key metrics •  Shared view to give all same vision •  So simple that everyone can understand it

Use metrics when making decisions •  You have powerful information, use it when you’re making decisions

Maintain and follow the metrics •  Follow the metrics - now popular feature can be waste tomorrow •  Remove unneeded metrics – it’s code, it need to be maintained

of Metrics Driven Development (MDD)

11/09/14 11 Erno Aapa / 2014

Librato Blog / 2014 InfoQ / 2012

User story

As a user I can order items in my shopping cart.

We measure visited users and count of orders. We expect to at least 20% of users to order items in shopping cart.

Define what to measure and what we expect to user story

11/09/14 12

Can be

defined in EPIC or

FEATURE too

Collect required metrics

# Somewhere in your where login happens... statsd.increment("users.visited”)

Developer adds required code to collect metrics

11/09/14 13

# ...and where you handle the orders statsd.increment(”shopping.ordered”)

Somefile.rb

OtherFile.rb

Share visibility Visualize metric and also what we expect

11/09/14 14

Visualize what we are expecting. Easy to everyone understand what is good or bad.

Use this information when making decisions

Single place of truth Provide single place for data to easily compare and analyze

11/09/14 15

Frontend Service Monitoring

Shared dashboard for key metrics Make most important metrics visible to everyone - all the time

11/09/14 16

In the meeting Use the information when you’re making decisions

11/09/14 17

•  What features have went to live?

•  Did we achieve what wanted?

•  Should we continue to improving them?

•  Or should we remove them?

•  What is our overall status?

•  What we do next to achieve our goal?

-  Get the “feeling” and visibility to production

-  Drop unused features

-  Can focus on important parts

PRODUCT OWNER / STAKEHOLDERS -  Get real time information from

production

-  Can make data driven decisions

-  Support LeanStartup way

DEVELOPER

TEST MANAGER -  Get visibility to production

-  Can scale tests to match with production

SYSADMIN -  Get visibility to inside of the app

-  In case of problems easier to find the error

Everyone benefit from it

11/09/14 18

just to name few…

Communication

11/09/14 19

Ops

Business

Dev

Metrics

Shared view and goals helps communication

Benefits of MDD

Imaginary case Simple Java web shop

My Webshop

11/09/14 21

Web shop

UI Server Graphite Grafana

User orders items through Web UI

After processing the order, server sends data to Graphite

Use Grafana to visualize data.

By developer

11/09/14 22

1. Create counter

2. Increment counter

THATS ALL!

By developer

11/09/14 23

SEE THE

RESULTS!

Type the metric name and…

Gauges

•  Queue sizes

Timers

•  Query times

•  Response times

Counters

•  Execution counters

Histograms

•  median, 75th, 90th, 95th, 98th, 99th

… when you get started

11/09/14 24

Measure time with single @annotation

There is so many things what you can measure, if you have the capability

”Metrics are always powerful political ammunition, to be used for better or worse” Metrics-driven Enterprise Software Development- book ”Tried a solution where OPS and DEV were sharing one metrics server… it didn't work” Mantas Klasavicius / Adform ”MONITOR ALL THE THINGS! No, you should not wrap every single method call in your application to increment a counter” Librato blog

Pitfalls

What tools to use? To collect, store, display? SaaS maybe?

11/09/14 27

TOOLS WHY

COLLECT •  StatsD (multiple languages) •  Code Hale Metrics (Java) •  Easy to implement by self

•  Provide ”one-line” –way to developer collect any metrics from application

•  Possible to change storage from configuration

STORE •  Graphite •  InfluxDB •  … and more

•  Store time-series data •  Provides easy accessable API •  Aggrigate data on ”on the fly” •  Downsample old data

DISPLAY •  Grafana •  Tesseo •  … and more

•  Simple to use •  Simple and clear graphs •  Possible to create multiple dashboards

SaaS •  Librato.com •  geckoboard.com •  leftronic.com •  HostedGraphite.com •  Influxdb.com •  …and many more

•  Easy to get started •  Maintained •  Free to test

Give a demo and convince others!

Graphite vs. InfluxDB echo “dc1.server2.cpuload 5.6" | nc graphite.com

echo “app.visitors 1" | nc graphite.com 2003

11/09/14 28

"name" : “cpu_load",

"columns" : ["value", ”dc", ”server"],

"points" : [ [5.6, ”dc1", ”server2”] ]

"name" : “visitors",

"columns" : ["value”, ”browser", ”version"],

"points" : [ [1, ”Chrome” , 37.0] ]

Send data Key / Value Json

Graphite vs. InfluxDB

average(dc1.*.cpuload)

sum(app.visitors)

Not possible :(

// Top 5 servers with highest load

highestAverage(*. *.load,5)

11/09/14 29

select average(value) from cpu_load where dc = ‘dc1’

select sum(value) from visitors

select sum(value) from visitors where browser = ‘Chrome’ and

version > 35

Probably not possible :(

Aggregate data

Not full SQL!

Graphite vs. InfluxDB

•  Older project

•  Harder to install / configure

•  Over 100 aggregation functions sum, cumulative, compare, highestAverage

•  Good for collecting metrics

•  Limited “WHERE” queries

11/09/14 30

•  Really young project (Apr 2013)

•  Easy to get started

•  Only basic aggregation functions(17 total) sum, min, max, median, percentile

•  Good for collecting events

•  Limited possibilities to aggregate data

Summary

Don’t pick one, use both! (and don’t forget to check SaaS services too!)

What next? That’s cannot be all…

What next - Annotations

11/09/14 32

What happen at 19:27? There were ad in the TV

Add annotations automatically when something happens what can change the service state. Like deployments, other events, etc.

• Have a easy way to add new alerts

• Don’t have to be a major alerts, small “reminders” are ok too •  Notify developers by email when any page response time goes higher than 300ms •  Notify business and developers when any feature usage goes lower than 5%

What next - Alerts

11/09/14 33

Users not

using it?

Feature “favourite list” usage lower than 5%

It’s waste… REMOVE IT!

Cleaner codebase No complex UIs

Less functionality

= easier to maintain

Easy to get started, so why not?

• Next time plan how you could measure it • Add required metrics to code • Use free SaaS services to get easily started • Create nice dashboard, add the most important graph and convince others!

11/09/14 34

Try it today, create a demo to your team, convince them

Questions? Thank you! Erno Aapa @ernoaapa Developer, DevOps-consultant Erno.aapa@avaus.fi 10.09.2014

top related