Java application monitoring with
Dropwizard Metrics
and GraphiteRoberto Franchini
@robfrankie
Bologna, April 10th, 2015
whoami(1)
15 years of experience, proud to be a programmerWrites software for information extraction, nlp, opinion
mining (@scale ), and a lot of other buzzwordsImplements scalable architectures
Plays with servers (don't say that to my sysadmin)Member of the JUG-Torino coordination team
feedback http://lanyrd.com/sdkghq
2
Company
3
Agenda
IntroScenario
System monitoringApplication monitoring (dark side)Application monitoring (light side)
Dropwizard MetricsDashboards
4
Quotes
Business value
Our code generates business value when it runs, not when we write it.
We need to know what our code does when it runs.We can’t do this unless we measure it.
(Codahale)
6
SLA driven
Have an SLA for your serviceMeasure and report performance against the SLA
(Ben Treynor, google inc.)
7
Scenario
45 bare metal serversNgnix
Jetty (mainly embedded)PostgreSQL
GlusterFS (28TB and growing)Kestrel
Kafka on the horizonRedis
Jenkins as scheduler (cron on steroids)
Infrastructure
9
Software
Java shopHome made distributed search engine
Home made little PAASDocker on the go
More than 120 webappsMore than 100 batch jobs
NRT stream processing jobs running 24x7
10
Java
Java is not deadAnd is almost everywhereThe language is evolving
The JVM is the most advanced managed environment where run your code
Choose your style: Scala, Clojure, Groovy
11
Who uses it (cool side)
TwitterSpotifyGoogleNetflix
12
Who uses it (real world)
Your bank
13
Systems monitoring
Collectd
From 2012 Collectd
systems: load, df, traffic java (via jmx): heapqueues: items, size
dbms: connections, size
15
Collectd charts
Traffic
16
Collectd to Graphite
collectd writes to graphitewrite_graphitebetter charts
dashboard are easydashboards are meaningful
17
Graphite dashboard
Servers load dashboard
18
Grafana
GrafanaA beautiful frontend for graphite
Dashboards are meaningfuland
BEAUTIFUL(you can send screenshots to managers now)
19
Grafana dashboard
20
Application monitoring
Requirements
Measure behaviorsSend to graphite
Integrate with system measuresCorrelate with system measures
22
Repeat with me
Correlate application and system metrics
23
Correlate
graphite
collectd
applications
grafana
24
To do what?
Discover bottleneckspost-mortem analysis
SLA monitoringIO impact
Network trafficMemory
25
User Story
Given the application runningwhen the manager comes
then I want to show a big green number
26
The answer
42
27
In detail
“Application monitoring? WHAT?”“Ok, let me explain
What the app is doing right now?How is the app performing right now?
And then graph it!”“Ok, I got it!”“Let me see”
28
5 minutes laterpublic class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = System.currentTimeMillis();
//heavy business logic
called++;
final long end = System.currentTimeMillis();
final long duration = end - start;
totalTime +=duration;
}
public void logStats() {
System.out.println("---stats---");
//I can’t write that
}
}
29
DIY Java Monitoring
Maybe better with centralized utility class(maybe…)
thread safeness?send measure to different backends?
log to different logging systems?
30
Java Monitoring
Measure in the codeThread safeness
Counters, gauges, meters etc.Log metrics
Graph metricsExport metrics
31
NOT only JMXWe want more
Integrate JMX metrics from third-party libs
JMX
32
Dropwizard Metricshttps://dropwizard.github.io/metrics/3.1.0/
Overview
Code instrumentationmeters, gauges, counters, histograms
Reportersconsole, csv, slf4j, jmx
Web app instrumentationWeb app health check
Advanced reportersgraphite, ganglia
34
Overview
Third party libsaspectjinfluxdbstatsd
cassandra
35
Main parts
MetricsRegistrya collection of all the metrics for your applicationusually one instance per JVMuse more in multi WAR deployment
Nameseach metric has a unique nameregistry has helper methods for creating names
MetricRegistry.name(Queue.class, "items", "total")
//com.example.queue.items.total
MetricRegistry.name(Queue.class, "size", "byte")
//com.example.queue.size.byte
36
Metrics
Gaugesthe simplest metric type: it just returns a value
Countersincrementing and decrementing 64.bit integer
final Map<String, String> keys = new HashMap<>();registry.register(MetricRegistry.name("gauge", "keys"), new Gauge<Integer>() {
@Overridepublic Integer getValue() {
return keys.keySet().size();}
});
final Counter counter= registry.counter(MetricRegistry.name("counter", "inserted"));counter.inc();
37
Metrics
Histogramsmeasures the distribution of values in a stream of data
Metersmeasures the rate at which a set of events occur
final Histogram resultCounts = registry.histogram(name(ProductDAO.class, "result-counts");resultCounts.update(results.size());
final Meter meter = registry.meter(MetricRegistry.name("meter", "inserted"));meter.mark();
38
Metrics
Timersa histogram of the duration of a type of event and a meter of the rate of its occurrence
Timer timer = registry.timer(MetricRegistry.name("timer", "inserted"));
Context context = timer.time();
//timed ops
context.stop();
39
Reporters
JMXexpose metrics as JMX Beans
Consoleperiodically reports metrics to the console
CSVappends a set of .csv files in a given dir
SLF4jlog metrics to a logger
Graphitestream metrics to graphite
40
Console reporter final ConsoleReporter console = ConsoleReporter.forRegistry(registry)
.outputTo(System.out)
.convertRatesTo(TimeUnit.MINUTES)
.build();
console.start(10, TimeUnit.SECONDS);
4/9/15 11:45:57 PM =============================================================
-- Gauges ----------------------------------------------------------------------gauge.keys value = 9901
-- Counters --------------------------------------------------------------------counter.inserted count = 9901
-- Meters ----------------------------------------------------------------------meter.inserted count = 9901
41
slf4j reporterfinal Slf4jReporter logging = Slf4jReporter.forRegistry(registry)
.convertDurationsTo(TimeUnit.MINUTES)
.outputTo(LoggerFactory.getILoggerFactory().getLogger("metrics")) .
build();
logging.start(20, TimeUnit.SECONDS);
0 [metrics-logger-reporter-2-thread-1] INFO metrics - type=GAUGE, name=gauge.keys, value=9012 [metrics-logger-reporter-2-thread-1] INFO metrics - type=COUNTER, name=counter.inserted, count=9016 [metrics-logger-reporter-2-thread-1] INFO metrics - type=METER, name=meter.inserted, count=901, mean_rate=90.03794743129822, m1=81.7831205903394, m5=80.52726521433198, m15=80.30969500950305, rate_unit=events/second14 [metrics-logger-reporter-2-thread-1] INFO metrics - type=TIMER, name=timer.inserted, count=900, min=1.9083333333333335E-8, max=0.016671673633333335, mean=1.667999479718904E-4, stddev=0.0016585493668388946, median=7.196666666666667E-8, p75=1.3421666666666667E-7, p95=2.7838333333333335E-7, p98=7.131833333333334E-7, p99=0.01666843721666667, p999=0.016671673633333335, mean_rate=89.8720293570475, m1=81.59911170741354, m5=80.33057092356765, m15=80.11080303990207, rate_unit=events/second, duration_unit=minutes
42
Graphite reporterfinal Graphite graphite = new Graphite(new InetSocketAddress("graphite.example.com", 2003));
final GraphiteReporter reporter = GraphiteReporter.forRegistry(registry)
.prefixedWith("web1.example.com")
.convertRatesTo(TimeUnit.SECONDS)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.filter(MetricFilter.ALL)
.build(graphite);
reporter.start(1, TimeUnit.MINUTES);
Metrics can be prefixedUseful to divide environment metrics: prod, test
43
Metrics naming
Dot notation by getClass()easy to createvery long name on dashboard
Maybe better to use<namespace>.<instrumented section>
.<target (noun)>.<action (past tense verb)>
Such asaccounts.authentication.password.failed
Use prefixprod, test, dev, localdifferentiate data retention on graphite by prefix
44
Grafana application overview
45
Demo
References
https://dropwizard.github.io/metrics/3.1.0/https://dl.dropboxusercontent.com/u/2744222/2011-04-09-
Metrics-Metrics-Everywhere.pdfhttp://graphite.wikidot.com/
http://grafana.org/http://matt.aimonetti.net/posts/2013/06/26/practical-guide-
to-graphite-monitoring/https://www.usenix.
org/sites/default/files/conference/protected-files/srecon15_slides_limoncelli.pdf
47
Thank Youhttp://lanyrd.com/sdkghq
@robfrankie
48