measure all the things! - austin data day 2014

106
Measure All The Things! Gary Dusbabek Rackspace @gdusbabek

Upload: gdusbabek

Post on 28-Nov-2014

671 views

Category:

Technology


1 download

DESCRIPTION

Slides used during presentation that covered metrics gathering and analysis

TRANSCRIPT

Page 1: Measure All the Things! - Austin Data Day 2014

Measure All The Things!

Gary Dusbabek Rackspace

@gdusbabek

Page 2: Measure All the Things! - Austin Data Day 2014

Motivation What You Really Want

Kinds of Metrics How To Do It

Prognostication

Page 3: Measure All the Things! - Austin Data Day 2014

Motivation

Page 4: Measure All the Things! - Austin Data Day 2014

It’s all about

the data

Page 5: Measure All the Things! - Austin Data Day 2014

We are generating data at an insane rate.

Page 6: Measure All the Things! - Austin Data Day 2014

We are generating data at an insane rate.

Page 7: Measure All the Things! - Austin Data Day 2014

2006 IDC estimates 161 Exabytes of

data on the Internet

That is 161 MM 1T drives

Page 8: Measure All the Things! - Austin Data Day 2014

2009

988 Exabytes of data

6x growth in 4 years

Almost 1B 1T drives

A zetabyte 21 zeroes

Source http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

Page 9: Measure All the Things! - Austin Data Day 2014
Page 10: Measure All the Things! - Austin Data Day 2014

2012 Internet was estimated to be shipping roughly 2.5 exabytes of data daily.

Daily

Not counting the NSA

Page 11: Measure All the Things! - Austin Data Day 2014

Transferring Data

Generates Data

Page 12: Measure All the Things! - Austin Data Day 2014

Metadata!

Page 13: Measure All the Things! - Austin Data Day 2014

Secondary Information

Page 14: Measure All the Things! - Austin Data Day 2014

A by-product

Page 15: Measure All the Things! - Austin Data Day 2014

Example 1

Page 16: Measure All the Things! - Austin Data Day 2014

Cloud Monitoring

Is the website up?

GET HTTP/1.1

Page 17: Measure All the Things! - Austin Data Day 2014

Status=200 Bytes=432

Time to connect=15ms Time to first byte=21ms

Duration=28ms

Page 18: Measure All the Things! - Austin Data Day 2014

Example 2

Page 19: Measure All the Things! - Austin Data Day 2014

Netflix

You want to watch an episode of Buffy

Page 20: Measure All the Things! - Austin Data Day 2014

Observations What titles you click on What time of day you started watching When you paused Parts you re-watched When you finished (if you finished)

Page 21: Measure All the Things! - Austin Data Day 2014

Useless to people consuming the primary data.

Page 22: Measure All the Things! - Austin Data Day 2014

Priceless when you’re trying to understand

behavior.

Page 23: Measure All the Things! - Austin Data Day 2014

behavior

Page 24: Measure All the Things! - Austin Data Day 2014

Understanding = Knowledge

Page 25: Measure All the Things! - Austin Data Day 2014

In these cases all the data generated is

time-series

Page 26: Measure All the Things! - Austin Data Day 2014

Time Series Data

Related events sorted by time of occurrence

Page 27: Measure All the Things! - Austin Data Day 2014

Example 0600 – Wake up 0601 – Checked Hacker News 0605 – Shower 0630 – Breakfast 0630 – Checked Hacker News 0700 – Left for work 0730 – Arrived at work Etc…

Page 28: Measure All the Things! - Austin Data Day 2014

Think about how you’d store something like this if

you were building a backend system

Page 29: Measure All the Things! - Austin Data Day 2014

Relational Database Much?

Page 30: Measure All the Things! - Austin Data Day 2014

You

0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

Page 31: Measure All the Things! - Austin Data Day 2014

You 0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

You

You

You

You

You

You

You

Who

Page 32: Measure All the Things! - Austin Data Day 2014

You 0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

You

You

You

You

You

You

You

Who

0603Wake

upFriend

0604Checked Hacker News

Friend

0715Left for workFriend

Page 33: Measure All the Things! - Austin Data Day 2014

Other Ways?

Page 34: Measure All the Things! - Austin Data Day 2014
Page 35: Measure All the Things! - Austin Data Day 2014

Less Appealing

Page 36: Measure All the Things! - Austin Data Day 2014

You 0600Wake

up0601

Checked Hacker News

0605 Shower 0630 Breakfast 0630Checked Hacker News

0700Left for work

0730Arrive

at work0731

Checked Hacker News

Friend 0603Wake

up0604

Checked Hacker News

0715Left for work

Column Oriented

Page 37: Measure All the Things! - Austin Data Day 2014

What You Really Want

Page 38: Measure All the Things! - Austin Data Day 2014

You run a

business

Page 39: Measure All the Things! - Austin Data Day 2014

You want to make money

Page 40: Measure All the Things! - Austin Data Day 2014

You want to make money

Show me the money!

Page 41: Measure All the Things! - Austin Data Day 2014

You need to make

decisions

Page 42: Measure All the Things! - Austin Data Day 2014

You need to make the right

decisions

Page 43: Measure All the Things! - Austin Data Day 2014

How do you do that?

Page 44: Measure All the Things! - Austin Data Day 2014

With your gut

Page 45: Measure All the Things! - Austin Data Day 2014

With data

Page 46: Measure All the Things! - Austin Data Day 2014

Example

Page 47: Measure All the Things! - Austin Data Day 2014

API responses are taking a long time.

Page 48: Measure All the Things! - Austin Data Day 2014

It’s probably the database.

Page 49: Measure All the Things! - Austin Data Day 2014

You add a few indexes.

You allocate more memory.

You get faster disks.

You get bigger processors.

Page 50: Measure All the Things! - Austin Data Day 2014

Maybe it’s the network…

Page 51: Measure All the Things! - Austin Data Day 2014

You replace ethernet adapters.

You get faster switches.

You replace the cabling.

Page 52: Measure All the Things! - Austin Data Day 2014

Crap!

Page 53: Measure All the Things! - Austin Data Day 2014

Trace it!

Page 54: Measure All the Things! - Austin Data Day 2014

500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser

Page 55: Measure All the Things! - Austin Data Day 2014

500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser

Page 56: Measure All the Things! - Austin Data Day 2014

Make the right decisions with data.

Page 57: Measure All the Things! - Austin Data Day 2014

You need a metrics system

Page 58: Measure All the Things! - Austin Data Day 2014

Take these things into account:

Availability Redundancy Accuracy

Page 59: Measure All the Things! - Austin Data Day 2014

And your budget

Page 60: Measure All the Things! - Austin Data Day 2014

Example: Pretty Graphs

Page 61: Measure All the Things! - Austin Data Day 2014

If graphs go away, do you lose money?

Page 62: Measure All the Things! - Austin Data Day 2014

The CEO likes them.

Page 63: Measure All the Things! - Austin Data Day 2014

Do graphs help you make decisions?

Page 64: Measure All the Things! - Austin Data Day 2014

Example: Usage Billing

Page 65: Measure All the Things! - Austin Data Day 2014

Will losing data cost you money?

Page 66: Measure All the Things! - Austin Data Day 2014

Data Lifecycle

Page 67: Measure All the Things! - Austin Data Day 2014

When can I throw it away?

Page 68: Measure All the Things! - Austin Data Day 2014

How much work is throwing it away?

Page 69: Measure All the Things! - Austin Data Day 2014

How much work is throwing it away?

Page 70: Measure All the Things! - Austin Data Day 2014

More work means it probably

won’t happen.

Page 71: Measure All the Things! - Austin Data Day 2014

Kinds of Metrics

Page 72: Measure All the Things! - Austin Data Day 2014

{Volume, Frequency} ⨯ {Low, High}

Page 73: Measure All the Things! - Austin Data Day 2014

Low Volume, High Frequency

5,6,5,6 Things observed infrequently Almost always changes

Low storage overhead

Bulk operations are easy

Usually uninteresting

Page 74: Measure All the Things! - Austin Data Day 2014

Low Volume, Low Frequency

5,5,5,6

Roughly the same as LVHF

Page 75: Measure All the Things! - Austin Data Day 2014

High Volume, Low Frequency 5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7 Constantly observed But doesn’t change much Optimizations!

Detect and record only level changes Requires caching

Page 76: Measure All the Things! - Austin Data Day 2014

High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…

Page 77: Measure All the Things! - Austin Data Day 2014

High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…

Page 78: Measure All the Things! - Austin Data Day 2014

Numeric vs String

Most will be numeric Some are strings

Usually low frequency Special handling

Page 79: Measure All the Things! - Austin Data Day 2014

Numeric vs String

High frequency strings are a sign you’re doing

something wrong or need a different system.

Page 80: Measure All the Things! - Austin Data Day 2014

Gauges

Current value of something

Operation: snapshot

Speedometer

Thermometer

CPU utilization

Page 81: Measure All the Things! - Austin Data Day 2014

Counter Exists as a set of operations – Operation: increment – Operation: decrement

Read by selecting over time and summing

Example: hits on a website Different than unique hits

Page 82: Measure All the Things! - Austin Data Day 2014

Set statsD Number of uniquely seen items Think: Conditional counter Example: number of unique visitors

Page 83: Measure All the Things! - Austin Data Day 2014

Timer

How long something takes Statistics (mean, median, min, max, percentiles)

How many times it has happened

Rate at which it is happened

Uses a sliding window

Page 84: Measure All the Things! - Austin Data Day 2014

Histograms

Distribution of data

Example: when people visit your site

Page 85: Measure All the Things! - Austin Data Day 2014
Page 86: Measure All the Things! - Austin Data Day 2014
Page 87: Measure All the Things! - Austin Data Day 2014

How Do You Do It?

Page 88: Measure All the Things! - Austin Data Day 2014

If you make software

Instrument it! Java?

https://github.com/codahale/metrics Node.js?

https://github.com/mikejihbe/metrics Others?

Of course

Page 89: Measure All the Things! - Austin Data Day 2014

If you run systems

Instrument them!

Get data via agent

Get data via pollers Considerations: inside or outside of your network

Page 90: Measure All the Things! - Austin Data Day 2014

StatsD

https://github.com/etsy/statsd Ingests, aggregates, flushes Use a client to send your data Pushes aggregations

Graphite Databases Flat files of JSON Wherever

Page 91: Measure All the Things! - Austin Data Day 2014

Graphite

http://graphite.wikidot.com

Makes graphs

Pluggable backends (NEW!!!11)

Scaling problems

Page 92: Measure All the Things! - Austin Data Day 2014

Buy Enterprise Software

These exist, but I’m an open source hacker and can’t say

much about them.

Page 93: Measure All the Things! - Austin Data Day 2014

Roll Your Own

Easier than you think

Harder than you think

Page 94: Measure All the Things! - Austin Data Day 2014

Roll Your Own Three components

Ingestion Aggregation/Rollup

Query/Graphing

Page 95: Measure All the Things! - Austin Data Day 2014

Avoid Pileups 1 sample per second 3,600 samples per hour 86,400 samples per day 31,536,000 samples per year 1k of storage? (roughly) 32 gigabytes

Page 96: Measure All the Things! - Austin Data Day 2014
Page 97: Measure All the Things! - Austin Data Day 2014

No!

Page 98: Measure All the Things! - Austin Data Day 2014

Measure all the right things!

Page 99: Measure All the Things! - Austin Data Day 2014

Does this measurement matter?

You don’t care about it when it changes

You aren’t doing anything with it

You can’t figure out what actions to take from it

(it’s meaningless)

Page 100: Measure All the Things! - Austin Data Day 2014

Recent data will almost always be

most important.

Page 101: Measure All the Things! - Austin Data Day 2014

Monitoring vs Aggregation

Graphite collects data that is already aggregated.

You are observing history

Looking for patterns

No alerting

Page 102: Measure All the Things! - Austin Data Day 2014

Where Things Are Going

Page 103: Measure All the Things! - Austin Data Day 2014

Complex Event Analysis

ESPER (my favorite). – Mostly open source.

Not enough projects though L

Page 104: Measure All the Things! - Austin Data Day 2014

Data Intelligence

You need this if you don’t know what questions you ought to

ask

Correlating signals in order to make useful conclusions

Page 105: Measure All the Things! - Austin Data Day 2014

Thanks!��

@gdusbabek

Page 106: Measure All the Things! - Austin Data Day 2014

Photos from the Flickr CC collection

train data dump truck traffic byproduct watching numbers birds moons cake business guts data 2 choices flowers metrics gauge counter marbles timer windmils logs train tower

h"p://www.flickr.com/photos/vxla/4673817364/sizes/z/  h"p://www.flickr.com/photos/tensafefrogs/3649985674/sizes/z/  h"p://www.flickr.com/photos/seanhobson/3906189027/sizes/l/  h"p://www.flickr.com/photos/shankaronline/7291507876/sizes/l/  h"p://www.flickr.com/photos/honou/3350764803/sizes/l/  h"p://www.flickr.com/photos/jdickert/2152739544/sizes/l/  h"p://www.flickr.com/photos/28misguidedsouls/6517859113/sizes/z/  h"p://www.flickr.com/photos/55176801@N02/7911595842/sizes/o/  h"p://www.flickr.com/photos/johnkay/3764457497/sizes/l/  h"p://www.flickr.com/photos/andykirk/412600169/sizes/l/  h"p://www.flickr.com/photos/jeff-­‐anderson/4385042770/sizes/l/  h"p://www.flickr.com/photos/sgis/6532363/sizes/o/  h"p://www.flickr.com/photos/whatbe"erNme/405735418/sizes/l/  h"p://www.flickr.com/photos/rachubarama/2709346242/sizes/l/  h"p://www.flickr.com/photos/femto-­‐photography/4604878864/sizes/o/  h"p://www.flickr.com/photos/pixx0ne/5689978130/sizes/l/  h"p://www.flickr.com/photos/ruth_w/8432567657/sizes/l/  h"p://www.flickr.com/photos/wesley_lelieveld/8571911541/sizes/l/  h"p://www.flickr.com/photos/lifeasart/242208550/sizes/l/  h"p://www.flickr.com/photos/mrsenil/2219108948/sizes/l/  h"p://www.flickr.com/photos/crisNc/2773883011/sizes/l/  h"p://www.flickr.com/photos/ma"blaze/4491948497/sizes/l/  h"p://www.flickr.com/photos/kenNsh/43788618/sizes/o/  h"p://www.flickr.com/photos/dtanist/10809534755/sizes/l/  h"p://www.flickr.com/photos/jarodcarruthers/10372829184/sizes/l/