hobbit – monitoring that works

42
Hobbit – Monitoring that works Hobbit Monitor Linuxforum 2007 March 3 rd 2007 Henrik Storner <[email protected]>

Upload: keola

Post on 14-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Hobbit Monitor Linuxforum 2007 March 3 rd 2007 Henrik Storner . Hobbit – Monitoring that works. Agenda. Hobbit history What does “monitoring” mean? Demo / Screen shots Architecture: The Hobbit components Hobbit server Network server monitoring - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hobbit – Monitoring that works

Hobbit – Monitoring that works

Hobbit MonitorLinuxforum 2007March 3rd 2007

Henrik Storner <[email protected]>

Page 2: Hobbit – Monitoring that works

Agenda

Hobbit history What does “monitoring” mean? Demo / Screen shots Architecture: The Hobbit components

Hobbit server Network server monitoring Server monitoring with the clients

Setting it up: Quick tour of the configuration Custom checks Hobbit at the CSC Copenhagen data center Future

Page 3: Hobbit – Monitoring that works

Hobbit history

In 2001, CSC's Managed Web Services division had no monitoring of websites, only box-monitoring with Unicenter/TNG.

Enter Big Brother – the UI is great, but BB is written in Korn shell and is slooooow!

bbgen toolkit (2002) eliminated some slow parts of BB, but kept the BB daemon.

Hobbit (2005) is mostly compatible with BB but has a completely different architecture.

Hobbit contains no BB parts.

Page 4: Hobbit – Monitoring that works

What does “monitoring” mean?

Availability: Can I access the website? Performance: ... without getting bored ? Capacity: ... when we triple the number of users?

It is vital that you know which of these questions you must answer.

The main focus for Hobbit is availability(but there's a bit of the others thrown in)

Page 5: Hobbit – Monitoring that works

Hobbit overview

A small demo

Page 6: Hobbit – Monitoring that works
Page 7: Hobbit – Monitoring that works
Page 8: Hobbit – Monitoring that works
Page 9: Hobbit – Monitoring that works
Page 10: Hobbit – Monitoring that works
Page 11: Hobbit – Monitoring that works
Page 12: Hobbit – Monitoring that works
Page 13: Hobbit – Monitoring that works
Page 14: Hobbit – Monitoring that works
Page 15: Hobbit – Monitoring that works
Page 16: Hobbit – Monitoring that works

Hobbit architecture

1 Hobbit server holds all current data, i.e. the status of everything we monitor.

Usually, the Hobbit server also hosts the web interface, and runs tasks which take care of storing history- and trend-data.

1+ servers perform network tests, and reports the results to the Hobbit server (may be the same)

Clients collect data from each monitored server, and send it to the Hobbit server for analysis.

Page 17: Hobbit – Monitoring that works

The Hobbit server

Stores current data in RAM only, and never does any slow disk I/O or process forking

Historical and trend data stored on disk Core hobbitd daemon feeds server tasks via an IPC

“channel”, using shared memory Server tasks handle e.g. client data analysis, trend

data updates, and alerting Some tasks can be distributed on multiple servers

(e.g. client data analysis) Extensible – you can write your own tasks, e.g. a

task that stores all measurements in a database.

Page 18: Hobbit – Monitoring that works

Hobbit web interface

Overview pages are static HTML, rebuilt once a minute with the current status data.

Detailed status pages are dynamically generated “Critical Systems” view is dynamic Will probably switch to an all-dynamic setup The web UI is not particularly attractive or

flexible, so web designers are welcome! Some customization is possible by modifying

header- and footer-files

Page 19: Hobbit – Monitoring that works

Network service monitoring

ping : Is the server alive ? Connect : Will it accept new connections ? Service : Is the network service running ? Application : Is it working ?

It is easy to check that the service is running because it uses a standard protocol (eg HTTP)

But the end-user only cares about the application!

Page 20: Hobbit – Monitoring that works

Webserver says it's “200 OK”

Check the actual data returned !

Page 21: Hobbit – Monitoring that works

And the devil's in the detail...

Page 22: Hobbit – Monitoring that works

Standard network tests

ping FTP SSH SMTP POP IMAP(S) NNTP(S) LDAP(S) query HTTP(S) w/ content SSL certificate

AJP13 VNC clamd spamd cupsd rsync Oracle TNS listener

add your own service definition

Page 23: Hobbit – Monitoring that works

Server monitoring – Hobbit clients

Usually a “Hobbit client” runs on the server. Clients are really dumb – they know how to

collect some data, but they know nothing about interpreting the data they collect.

Runs uptime, free, df, ps, netstat, mount, who ... Collects server-side statistics Scans server log files for new entries Collects data for directories and individual files The raw data is sent to the Hobbit server for

analysis.

Page 24: Hobbit – Monitoring that works

Standard server tests

CPU load average System uptime System clock Memory usage Swap usage File system usage Process counts Network ports

File attributes File data Directory sizes Log file data

Data can be graphed

Page 25: Hobbit – Monitoring that works

Setting it up

All configuration is kept on the Hobbit server All configuration files are text based Uses regular expressions a lot

bb-hosts (list hosts Hobbit knows about, defines network service tests and the web page layout)

hobbit-alerts.cfg (rules for sending alerts) hobbit-clients.cfg (rules for analyzing data from

clients) client-local.cfg (instructions for client data

collection)

Page 26: Hobbit – Monitoring that works

bb-hosts

## Master configuration file for Hobbit#

group My hosts127.0.0.1 localhost # bbd http://localhost/

192.168.1.1 demohost # pop3 http://127.0.0.1/ smtp \

cont=Login;https://www/Login.php;Please.*userid

Page 27: Hobbit – Monitoring that works

hobbit-alerts.cfg

HOST=demohost SERVICE=httpMAIL [email protected] TIME=W:0800:2200

SERVICE=httpMAIL [email protected] /usr/local/bin/smsalert +4512345678

Page 28: Hobbit – Monitoring that works

hobbit-clients.cfg

HOST=*DISK / 80 90 EXHOST=backup.foo.com

HOST=db.foo.comDISK %/data/ IGNORE

HOST=%web[1-9]PROC apache MIN=4 MAX=20DIR /var/log/apache SIZE<100000 TRACK yellowFILE /var/www/index.html \

MD5=dd2cf7192db28919203eef126943b

Page 29: Hobbit – Monitoring that works

client-local.cfg

# This file tell clients what file/log data to report

[linux]log:/var/log/messages

[web1]dir:/var/log/apachefile:/var/www/default.html:md5

Page 30: Hobbit – Monitoring that works

Why server-side analysis?

Managing configuration files on each monitored server is impossible when you have 2000 clients.

Bulk configuration updates are much easier Configuration settings can apply to groups of hosts Adding new analysis tools only requires upgrading

the Hobbit server – not all of the clients (provided they already collect the necessary data, of course)

Having RAW data available is USEFUL. Only downside: Your Hobbit server must spend

some cpu time analyzing the raw client data

Page 31: Hobbit – Monitoring that works

Custom checks

Custom checks normally check something, then send a red/yellow/green “status” message

A check can run locally on a host as part of the client installation

A check can run centrally and pull data from several hosts (eg. grabbing data with SNMP)

A check can run on the Hobbit server, using data that has already been collected (“combo-tests” or extra client-data analysis)

Numeric data can be tracked in graphs

Page 32: Hobbit – Monitoring that works

Simple client-side check

#!/bin/bashCOLUMN=weather; COLOR=green

DEGREES=`/usr/local/bin/getweather temperature`if [ $DEGREES -ge 30 ]; then COLOR=red; fi

$BB $BBDISP \ “status $MACHINE.$COLUMN $COLOR `date` temperature=$DEGREES”

exit 0

Page 33: Hobbit – Monitoring that works

Server-side checks

You can hook modules into all kinds of Hobbit data: Status messages, data collected from Hobbit clients and so on.

E.g. Hobbit clients run “who” to report who is logged on.

To monitor for a root login on all servers only takes is 62 lines of Perl (see hobbitd_rootlogin.pl in source)

Page 34: Hobbit – Monitoring that works

Windows, SNMP and other stuff

Windows client: BBWinNote: Does not support central configuration

SNMP add-on: Devmon Both are OSS, available on Sourceforge.net Other add-ons available, e.g. for database

monitoring. Add-ons for Big Brother (available from

deadcat.net) can be used – but check licensing

Page 35: Hobbit – Monitoring that works

Hobbit@CSC - Summary

The Copenhagen data center is the largest CSC data center in EMEA, globally in the top 5.

Hobbit/BBWin/BB clients on 90% of all servers. Hobbit is considered mission-critical.

Lots of network tests, especially for Web- and middleware systems (J2EE and LDAP)

Web application monitoring done through customer-built “monitoring” web pages

Page 36: Hobbit – Monitoring that works
Page 37: Hobbit – Monitoring that works

Hobbit@CSC – Multiple views

Multiple sets of web pages with Hobbit data: One set grouped by account manager, then by account:

Lets the account manager quickly see if his customers are running OK

One set grouped by sysadmin group, then by account: Lets the system administrators quickly see what servers need attention

One set for customers who want access to Hobbit The “Critical Systems” view is monitored 24x7

Page 38: Hobbit – Monitoring that works

Hobbit@CSC - reports

Availability reports pre-generated for daily, weekly and monthly availability

Reports and detailed history available on-line for 3 months

Monthly reports available for 12 months Graphs clean-up automatically, provide data for

1½ years (1 day average)

Page 39: Hobbit – Monitoring that works

Hobbit@CSC: Statistics

1 Web / 2 net serversSun E220R server450 Mhz Ultrasparc II1 GB RAM2x72 GB SCSI disk

1 RRD serverHP DL3803 Ghz Xeon1 GB RAM2x72 GB SCSI disk

3.800 hosts 28.000 statuses 9.500.000 updates/day

= 111 updates/second 3.100 network tests 40.000 webpages/day

27.000 RRD files = ~160.000 RRD datasets

8.500 RRD graphs/day

Page 40: Hobbit – Monitoring that works

Hobbit@CSC :Load on main server

Page 41: Hobbit – Monitoring that works

Future work

Load balancing of Hobbit tasks: 4.3.0 Graph updates and viewing History log storage Client data analysis Network checks

High availability ? Maybe not ... can be handled externally

Re-design the web UI – any volunteers ? Automated web checking of a full user session,

perhaps using Mozilla or Konqueror

Page 42: Hobbit – Monitoring that works

The End

Questions ?