osmc 2014: from monitoringsucks to monitoringlove (and back) | kris buytaert

From #MonitoringSucks to

#MonitoringLove

(and back)

@KrisBuytaert OSMC 2014 , Nuremberg, Germany

Kris Buytaert ●I used to be a Dev, ●Then Became an Op ●Chief Trolling Officer and Open Source Consultant @inuits.eu ●Everything is an effing DNS Problem ●Building Clouds since before the bookstore ●Organising Conferences ●Evangelizing devops

An opinionated talk about the Open Source Monitoring tooling landscape

In which I hope to learn from YOU

#devops=~C(L)AMS ● Culture

● (Lean)

● Automation

● Monitoring and Measurement

● Sharing

● Damon Edwards and John Willis

Gene Kim

Monitoring is usually an aftertought ENOBUDGET, ENOTIME

An 2008 OLS Paper ● We have bloated Java tools

● Some open Core stuff

● DYI folks want traditional Nagios

● DBA Required

#monitoringsucks ● John Vincent (@lusis), june 2011

● A sub #devops movement

● https://github.com/monitoringsucks/

Why #monitoringsucks ● Manual config (gui)

● Not in sync with reality

● Hosts only

● Services sometimes

● Aplication never

● Chaos or out of sync with reality

● Alert Fatigue

Let's forget about ● Tools with no (stable) API

● Tools with strong focus on GUI

● Unless you are an SME with < 100 nodes

● Zenoss, Hyperic, GroundWork, ....

● P.S. : don't even mention proprietary software to me

What we want

● Small , well suited components

• Collect

• Transport / Mangle

• Store

• Analyse

• Act / Alert

• Visualize

#monitoringlove

•Ulf Mansson #devopsdays Rome 2011

•A new era of tooling

•#monitoringlove hacksessions @inuits

•#monitorama

Icinga •2009 Fork

•I consider Nagios dead

•Vibrant Community (or they stalk me)

•Throw great parties in Nurnberg

•Nobody can pronounce it anyhow

•https://github.com/Inuits/puppet-icinga/

Stored Configs

#monitoringlove But the love was about :

Sensu ● Awesome for non static environments

● Scaling a clustered RabbitMQ ?

● This is Europe, U no do cloud

Automation of #monitoring brought back

the #love

●Autodetection

●Multiplexing

●Trend Forecasting

I love CheckMK

•Autodetection ?

•Service,

•Business Functionalities

•eg. vhosts etc

•Single Source of Truth

I hate CheckMK

Monitoring a service vs

Monitoring a Service

definition of done:

monitored and in production

A software project is not done untill your last end user is dead

Culture,

Automation,

Measurement : measure all the things

Sharing

Deploy Statistics ● Time To Deploy

● Deploy Frequency

● Lifecycle frequency

● Map to other metrics

CollectD all the metrics, at high intervals

Oldschool graphite

Self Service Gdash based pipelines

Puppetized Templates (wip)

Grafana

Graphite++ ● Dashboards

• Grafana

● Engines :

• InfluxDB

• Cyanite

Triggers on Graphs ● Export Java Metrics

● JMXTrans

● Export JMXConfigs

● Configure NRPE Check

● Export NagiosCheck

● Collect JMX Exports on JMXTransNode

● Graph Em

● Collect Icinga Configs on Icinga

Aggregation ● Alert on streams

● Alert on aggregated metrics

Riemann ● I still don't get it ?

● Distributed Top

● Do you like Clojure ?

● Riemann Health plugin ?

● s/riemann-health/collectd/g;

● Output to graphite

Graphs to Knowledge

Skyline

•Oculus

•Creating Information out of this data

•Big data

•Machine Learning

But I have log files..

Logs and Metrics ● Graylog2

● ELSA (Enterprise Log Search and Archive)

● ELK Stack

● Collect from anywhere

● Filter

● Send anywhere

● Queing

Black on White ?

APM But what about my apps ?

Half the world cheers about SAAS tools :(

Packetbeat ● Traffic Flow through network

● Transactions causing errros

● SQL per HTTP

● API call usage

PacketBeat

This new “D” hype

Containers are the new black

● 1 process per container

● Metric collection ?

● Service health ?

So you want service registration of your healthy (containerized) applications ?

Enter Consul.io ● Service discovery

● Failure detection

● Using Gossip build on top of Serf

● Random node 2 node communication

● A HashiCorp project

Consul ● Uses monitoring_plugins for health

● Creates unhealthy dns setups

● Sensu alike

● Key-Value store

● Consul_template => fills your templates

Everything is a freaking dns problem

Self Healing ● Pacemaker Corosync (ocf resource that monitors your service)

● Mesos

● Kubernetes

● Scale changes, Consensus Models change

So your DC fails

Whom to alert when ?

'New' kids on the block ● Flapjack

● flapjack.io

● monitoring notification routing + event processing system

● OpenDuty

● github.com/szechuen/OpenDuty

● Duty management

My Alerting Strategy

Is still in beta

And back :(

In 2014 I`m still running the same check for

- service registration (consul)

- high availability (pacemaker/corosync)

- monitoring (icinga)

But I love where Monitoring is heading

We have much less false positives

And we have a Maintainable Monitoring Infra

Your next trip to Gent !

CfgMgmtcamp.eu February 2 and 3, 2015

CFP is Open !

Contact Kris.Buytaert@inuits.eu Further Reading @krisbuytaert http://www.krisbuytaert.be/blog/ http://www.inuits.eu/

Inuits Duboistraat 50 2060 Antwerpen Belgium 891.514.231 +32 475 961221

osmc 2014: from monitoringsucks to monitoringlove (and back) | kris buytaert

Software

it is happening now 2015: devops (kris buytaert)

buytaert kris my_sql-pacemaker

osmc 2014: introduction into collectd | florian foster

icinga 2011 at osmc

osmc 2014: network discovery update | remo rickli

why we do monitoring wrong #osmc edition

cv & portfolio bert buytaertbertbuytaert.be/onewebmedia/bert...

manuale per l'utente ista-p - onl-osmc-b2i.bmwgroup.com

icinga 2010 at osmc

the observing system monitoring center (osmc)

novell @ osmc 2010 inside suse linux - netways gmbh ·...

configuration management and #monitoringlove

osmc 2014: why we do monitoring wrong | michael medin

bylaws for osmc - openmodelica...1 bylaws for the open...

the osmc is not part of the u.s. government, the u.s....

open source monitoring in 2014, from #monitoringssucks to...

“sensu and sensibility” - the story of a journey from...

keynote presentation of dries buytaert at iminds the...

panduan registrasi osmc 2020osmc.forpelindo.com/2020/panduan...

observing system monitoring center (osmc)