measure all the things! - austin data day 2014
DESCRIPTION
Slides used during presentation that covered metrics gathering and analysisTRANSCRIPT
Measure All The Things!
Gary Dusbabek Rackspace
@gdusbabek
Motivation What You Really Want
Kinds of Metrics How To Do It
Prognostication
Motivation
It’s all about
the data
We are generating data at an insane rate.
We are generating data at an insane rate.
2006 IDC estimates 161 Exabytes of
data on the Internet
That is 161 MM 1T drives
2009
988 Exabytes of data
6x growth in 4 years
Almost 1B 1T drives
A zetabyte 21 zeroes
Source http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
2012 Internet was estimated to be shipping roughly 2.5 exabytes of data daily.
Daily
Not counting the NSA
Transferring Data
Generates Data
Metadata!
Secondary Information
A by-product
Example 1
Cloud Monitoring
Is the website up?
GET HTTP/1.1
Status=200 Bytes=432
Time to connect=15ms Time to first byte=21ms
Duration=28ms
Example 2
Netflix
You want to watch an episode of Buffy
Observations What titles you click on What time of day you started watching When you paused Parts you re-watched When you finished (if you finished)
Useless to people consuming the primary data.
Priceless when you’re trying to understand
behavior.
behavior
Understanding = Knowledge
In these cases all the data generated is
time-series
Time Series Data
Related events sorted by time of occurrence
Example 0600 – Wake up 0601 – Checked Hacker News 0605 – Shower 0630 – Breakfast 0630 – Checked Hacker News 0700 – Left for work 0730 – Arrived at work Etc…
Think about how you’d store something like this if
you were building a backend system
Relational Database Much?
You
0600Wake
up
0601Checked Hacker News
0605 Shower
0630 Breakfast
0630Checked Hacker News
0700Left for work
0730Arrive
at work
0731Checked Hacker News
When What
You 0600Wake
up
0601Checked Hacker News
0605 Shower
0630 Breakfast
0630Checked Hacker News
0700Left for work
0730Arrive
at work
0731Checked Hacker News
When What
You
You
You
You
You
You
You
Who
You 0600Wake
up
0601Checked Hacker News
0605 Shower
0630 Breakfast
0630Checked Hacker News
0700Left for work
0730Arrive
at work
0731Checked Hacker News
When What
You
You
You
You
You
You
You
Who
0603Wake
upFriend
0604Checked Hacker News
Friend
0715Left for workFriend
Other Ways?
Less Appealing
You 0600Wake
up0601
Checked Hacker News
0605 Shower 0630 Breakfast 0630Checked Hacker News
0700Left for work
0730Arrive
at work0731
Checked Hacker News
Friend 0603Wake
up0604
Checked Hacker News
0715Left for work
Column Oriented
What You Really Want
You run a
business
You want to make money
You want to make money
Show me the money!
You need to make
decisions
You need to make the right
decisions
How do you do that?
With your gut
With data
Example
API responses are taking a long time.
It’s probably the database.
You add a few indexes.
You allocate more memory.
You get faster disks.
You get bigger processors.
Maybe it’s the network…
You replace ethernet adapters.
You get faster switches.
You replace the cabling.
Crap!
Trace it!
500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser
500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser
Make the right decisions with data.
You need a metrics system
Take these things into account:
Availability Redundancy Accuracy
And your budget
Example: Pretty Graphs
If graphs go away, do you lose money?
The CEO likes them.
Do graphs help you make decisions?
Example: Usage Billing
Will losing data cost you money?
Data Lifecycle
When can I throw it away?
How much work is throwing it away?
How much work is throwing it away?
More work means it probably
won’t happen.
Kinds of Metrics
{Volume, Frequency} ⨯ {Low, High}
Low Volume, High Frequency
5,6,5,6 Things observed infrequently Almost always changes
Low storage overhead
Bulk operations are easy
Usually uninteresting
Low Volume, Low Frequency
5,5,5,6
Roughly the same as LVHF
High Volume, Low Frequency 5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7 Constantly observed But doesn’t change much Optimizations!
Detect and record only level changes Requires caching
High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…
High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…
Numeric vs String
Most will be numeric Some are strings
Usually low frequency Special handling
Numeric vs String
High frequency strings are a sign you’re doing
something wrong or need a different system.
Gauges
Current value of something
Operation: snapshot
Speedometer
Thermometer
CPU utilization
Counter Exists as a set of operations – Operation: increment – Operation: decrement
Read by selecting over time and summing
Example: hits on a website Different than unique hits
Set statsD Number of uniquely seen items Think: Conditional counter Example: number of unique visitors
Timer
How long something takes Statistics (mean, median, min, max, percentiles)
How many times it has happened
Rate at which it is happened
Uses a sliding window
Histograms
Distribution of data
Example: when people visit your site
How Do You Do It?
If you make software
Instrument it! Java?
https://github.com/codahale/metrics Node.js?
https://github.com/mikejihbe/metrics Others?
Of course
If you run systems
Instrument them!
Get data via agent
Get data via pollers Considerations: inside or outside of your network
StatsD
https://github.com/etsy/statsd Ingests, aggregates, flushes Use a client to send your data Pushes aggregations
Graphite Databases Flat files of JSON Wherever
Graphite
http://graphite.wikidot.com
Makes graphs
Pluggable backends (NEW!!!11)
Scaling problems
Buy Enterprise Software
These exist, but I’m an open source hacker and can’t say
much about them.
Roll Your Own
Easier than you think
Harder than you think
Roll Your Own Three components
Ingestion Aggregation/Rollup
Query/Graphing
Avoid Pileups 1 sample per second 3,600 samples per hour 86,400 samples per day 31,536,000 samples per year 1k of storage? (roughly) 32 gigabytes
No!
Measure all the right things!
Does this measurement matter?
You don’t care about it when it changes
You aren’t doing anything with it
You can’t figure out what actions to take from it
(it’s meaningless)
Recent data will almost always be
most important.
Monitoring vs Aggregation
Graphite collects data that is already aggregated.
You are observing history
Looking for patterns
No alerting
Where Things Are Going
Complex Event Analysis
ESPER (my favorite). – Mostly open source.
Not enough projects though L
Data Intelligence
You need this if you don’t know what questions you ought to
ask
Correlating signals in order to make useful conclusions
Thanks!��
@gdusbabek
Photos from the Flickr CC collection
train data dump truck traffic byproduct watching numbers birds moons cake business guts data 2 choices flowers metrics gauge counter marbles timer windmils logs train tower
h"p://www.flickr.com/photos/vxla/4673817364/sizes/z/ h"p://www.flickr.com/photos/tensafefrogs/3649985674/sizes/z/ h"p://www.flickr.com/photos/seanhobson/3906189027/sizes/l/ h"p://www.flickr.com/photos/shankaronline/7291507876/sizes/l/ h"p://www.flickr.com/photos/honou/3350764803/sizes/l/ h"p://www.flickr.com/photos/jdickert/2152739544/sizes/l/ h"p://www.flickr.com/photos/28misguidedsouls/6517859113/sizes/z/ h"p://www.flickr.com/photos/55176801@N02/7911595842/sizes/o/ h"p://www.flickr.com/photos/johnkay/3764457497/sizes/l/ h"p://www.flickr.com/photos/andykirk/412600169/sizes/l/ h"p://www.flickr.com/photos/jeff-‐anderson/4385042770/sizes/l/ h"p://www.flickr.com/photos/sgis/6532363/sizes/o/ h"p://www.flickr.com/photos/whatbe"erNme/405735418/sizes/l/ h"p://www.flickr.com/photos/rachubarama/2709346242/sizes/l/ h"p://www.flickr.com/photos/femto-‐photography/4604878864/sizes/o/ h"p://www.flickr.com/photos/pixx0ne/5689978130/sizes/l/ h"p://www.flickr.com/photos/ruth_w/8432567657/sizes/l/ h"p://www.flickr.com/photos/wesley_lelieveld/8571911541/sizes/l/ h"p://www.flickr.com/photos/lifeasart/242208550/sizes/l/ h"p://www.flickr.com/photos/mrsenil/2219108948/sizes/l/ h"p://www.flickr.com/photos/crisNc/2773883011/sizes/l/ h"p://www.flickr.com/photos/ma"blaze/4491948497/sizes/l/ h"p://www.flickr.com/photos/kenNsh/43788618/sizes/o/ h"p://www.flickr.com/photos/dtanist/10809534755/sizes/l/ h"p://www.flickr.com/photos/jarodcarruthers/10372829184/sizes/l/