Download - OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan

So you want to switch off ?

Time to say goodbye to your Nagios based setup!

© 2014 - Olivier Jan - Check my Website@olivjan - [email protected]

https://twitter.com/olivjan

https://checkmy.ws

http://monitoring-fr.org
























































https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

https://checkmy.ws

mailto:[email protected]

About me

❖ System admin and architect

❖ Co-founder of « Communauté Francophone de la Supervision Libre »

❖ Writer of the book « Nagios 3 au cœur de la supervision Open Source »

❖ Co-founder of Check my Website, a SaaS service for remote monitoring of websites and applications (current)

Content

❖ Why switch off ? the good and maybe not so good reasons to do so !

❖ Which way to take ?

❖ Building a monitoring solution without Nagios :

❖ Tools available

❖ A personal work in progress

❖ Migrating from Nagios to this kind of solution

Some reasons to switch off…❖ The godfather of OSS monitoring is dead as an

Open Source project ?

❖ Can’t do better with it

❖ Cool new kids out there

❖ Better « cloud » support

❖ Clear states, metrics and messages monitoring distinction

❖ Better charting solution

❖ Near realtime monitoring

❖ Routing, aggregation, correlation…

❖ YOUR reasons ;)

http://blog.dataloop.io/2014/01/30/what-we-learnt-talking-to-60-companies-about-monitoring/

Which way to take ?❖ The « 4 mousquetaires »

❖ Naemon

❖ Icinga 2

❖ Shinken

❖ Centreon

❖ Reboot from building blocks❖ Collect❖ Store❖ Visualize❖ Alert

http://www.naemon.org/

https://www.icinga.org/icinga/icinga-2/

http://www.shinken-monitoring.org/

http://www.centreon.com/

Tools : Collecting metrics and messages❖ Packetbeat (metrics & messages)

❖ Rsyslog, NX log, Syslog-ng (messages)

❖ sFlow Toolkit, Host sFlow

❖ Logstash-forwarder (messages)

❖ Collectd (metrics)

❖ Diamond (metrics)

❖ OSquery, WMI (metrics)

❖ Network level (sFlow)

❖ System Level

❖ Application Level

http://packetbeat.com/

http://www.rsyslog.com

http://nxlog-ce.sourceforge.net

http://www.syslog-ng.org

http://www.inmon.com/technology/sflowTools.php

http://host-sflow.sourceforge.net/documentation.php

https://github.com/elasticsearch/logstash-forwarder

http://collectd.org

https://github.com/BrightcoveOS/Diamond

http://osquery.io

Tools : External collecting

❖ End user perspective

❖ Controls done closest to the end-user

❖ Application behavior

❖ Real User Monitoring

❖ Webpagetest

❖ Selenium

❖ PhantomasJS

❖ Boomerang

❖ Bucky

http://www.webpagetest.org/about

http://www.seleniumhq.org

https://github.com/macbre/phantomas

http://www.lognormal.com/boomerang/doc/

https://github.com/HubSpot/BuckyClient/blob/master/README.md

Tools : Routing metrics and messages

❖ Messages : Logstash, Flume, Fluentd

❖ Metrics : StatsD

❖ Metrics : Carbon Relay NG

One or more messages can fire an event

http://logstash.net/

http://flume.apache.org/

http://www.fluentd.org/

https://github.com/etsy/statsd/

https://github.com/graphite-ng/carbon-relay-ng

Tools : Databases

❖ Graphite : The most used.

❖ OpenTSDB : HBase

❖ KairosDB : Cassandra

❖ InfluxDB : The most promising ?

❖ Elasticsearch : Index database

http://graphite.wikidot.com/

http://opentsdb.net

https://github.com/kairosdb/kairosdb

http://influxdb.com/

http://www.elasticsearch.org/

Tools : Visualizing metrics and messages❖ Kibana❖ Grafana❖ Dashboards collection

https://github.com/obazoud/awesome-dashboard

Tools : Alerting

❖ Seyren : Alerting dashboard for Graphite.

❖ Cabot : Get alerted when services go down or metrics go crazy

❖ Bosun : An advanced, open-source monitoring and alerting system

❖ Skyline : Real-time anomaly detection system

❖ Oculus : Anomaly correlation component of Etsy's Kale system

❖ Esper : Complex Event Processing

https://github.com/scobal/seyren

http://cabotapp.com

http://blog.serverfault.com/2014/11/10/announcing-bosun/

https://github.com/etsy/skyline

https://github.com/etsy/oculus

http://esper.codehaus.org

The French Monitoring Community Xperience

❖ Reboot from building blocks❖ Collect❖ Store❖ Visualize❖ Alert

The French Monitoring Community Xperience

Is it working ? What is not working ?

Collecting metrics : Collectd

❖ InfluxDB Collectd proxy

❖ In Golang like InfluxDB

❖ Temporary solution

❖ Native Collectd plugin

LoadPlugin network

<Plugin network> # proxy address Server "127.0.0.1" "8096" </Plugin>

❖ PHP5-FPM metrics

❖ Nginx metrics

❖ MariaDB metrics

❖ System metrics

❖ <metricname>:<value>|<type>

https://github.com/hoonmin/influxdb-collectd-proxy

Collecting messages : Rsyslog❖ Nearly ready log consumption

❖ Native distribution package

❖ Nginx Log, MySQL slow query log

template(name=« ls_json" type=« list" option.json="on") { constant(value=« {") constant(value="\"@timestamp\":\"") property(name="timereported" dateFormat=« rfc3339") constant(value=« \",\"@version\":\"1") constant(value="\",\"message\":\"") property(name=« msg") constant(value="\",\"host\":\"") property(name=« hostname") constant(value="\",\"severity\":\"") property(name=« syslogseverity-text") constant(value="\",\"facility\":\"") property(name=« syslogfacility-text") constant(value="\",\"programname\":\"") property(name=« programname") constant(value="\",\"procid\":\"") property(name=« procid") constant(value=« \"}\n") }

Collecting @ network level : Packetbeat

❖ Specific agent

❖ Collect traffic for

❖ HTTP

❖ MySQL

❖ PostgreSQL

❖ Redis

Routing messages : Logstash❖ Inputs

❖ Codecs/filters

❖ Outputsinput { udp { port => 10514 codec => "json" type => "syslog" } }

filter { # This replaces the host field with the host that generated the message (sysloghost) if [sysloghost] { mutate { replace => [ "host", "%{sysloghost}" ] remove_field => "sysloghost" } } }

output { elasticsearch { host => localhost } }

Routing metrics : StatsD❖ Is now a protocol implemented

in all languages

❖ InfluxDB plugin

❖ Collectd can behave as a statsD daemon (plugin)

❖ Very easy to push metrics

echo "foo:1|c" | nc -u -w0 127.0.0.1 8125

Storing metrics : InfluxDB

❖ Make it behave like Graphite

❖ graphite-api

❖ carbon-relay-ng

❖ graphite-influxdb

❖ Cluster, cluster, cluster

❖ Design for events and metrics

https://github.com/brutasse/graphite-api

https://github.com/graphite-ng/carbon-relay-ng

https://github.com/vimeo/graphite-influxdb

Storing messages : Elasticsearch

❖ Index database

❖ Cluster, cluster, cluster

❖ Full text search

Visualizing @ network level : Packetbeat

❖ Kibana 3 modified version

❖ Dashboards ready out of the box

Visualizing metrics : Grafana❖ Compatible

❖ Graphite

❖ InfluxDB

❖ OpenTSDB

❖ Built on Kibana 3

Visualizing messages : Kibana 4

❖ Easy install

❖ Interactive dashboards

❖ Multiple indices

What's missing ? Wishes

❖ Alerting

❖ External monitoring

❖ Repository for dashboards…

❖ Giving sense to metrics and messages

Alerting reboot❖ Alert only on end user problems from an end

user perspective❖ IRC, Chat channel…❖ Alert thresholds based on history vs static

thresholds❖ Statistics functions❖ Boolean conditions

❖ Dynamic thresholds

❖ Anomaly detection

❖ Standard deviation

Coming from Nagios

❖ Graphios will inject perfdatas in Graphite or InfluxDB

❖ Check_graphite can query Graphite API from Nagios for alert based on history

❖ Logstash will send events to NSCA

❖ Nagios log in Kibana with Grok %{NAGIOSLINE}

❖ Keep Nagios for states ?

https://github.com/shawn-sterling/graphios

https://github.com/pyr/check-graphite

Questions ?

@olivjan

[email protected]

mailto:[email protected]

Download - OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan

Top Related