nagios conference 2012 - mike weber - disaster
TRANSCRIPT
10 Quick Steps To Disaster
Mike Weber
Inheriting Aberrations
with Objects
Where are those settings coming from?
Object Inheritance
Object Priorities
Object Chaining
Incomplete Objects
Canceling Inheritance
Additive Inheritance
Object Inheritance
Object Inheritance: Templates
Object Inheritance: No Hostgroups?
Object Inheritance: From Hostgroup
Object Inheritance: Info Option
Object Priorities: Local then Inheritance
Object Priorities: Order in List (Chaining)
Incomplete Object: Only Lists One Image
Canceling Inheritance: Object Contains Parents
Canceling Inheritance: Wrong Parents
Canceling Inheritance: Cancel Parents
Canceling Inheritance: Canceled Parents
Additive Inheritance: Append Object Contents
Additive Inheritance: Append Object Contents
Hoping BAD Things
Won't Happen
Real BAD Things Will Happen
Backups
Updates
Dependencies
XI: Automated Backup
/etc/cron.d/nagiosxi
0 7 * * * root /root/scripts/automysqlbackup
0 8 * * * root /root/scripts/autopostgresqlbackup
/store/backups/mysql
dailyweekly
monthly
/store/backups/postgresql
daily
weekly
monthly
XI: Upgrade Backup
#!/bin/bash
##### BackUp Of Nagios Before Upgrade #####
# Timestamp Backups TIMESTAMP=$(date +%Y%m%d_%H%M); echo $TIMESTAMP
service nagiosxi stop service npcd stop service ndo2db stop service nagios stop
mkdir /bk/upgrade_$TIMESTAMP tar cjf /bk/upgrade_$TIMESTAMP/nagios_$TIMESTAMP.tar.bz2 /usr/local/nagios tar cjf /bk/upgrade_$TIMESTAMP/nagiosxi_$TIMESTAMP.tar.bz2 /usr/local/nagiosxi pg_dump -U nagiosxi -c -F p nagiosxi | bzip2 -c > /bk/upgrade_$TIMESTAMP/pg_nagiosxi_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagios | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagios_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagiosql | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagiosql_$TIMESTAMP.sql.bz2
service nagios start service ndo2db start service npcd start service nagiosxi start
Core: Backup
#!/bin/sh# Timestamped Back Up
TIMESTAMP=`date +%Y%m%d_%H%M%S`;echo $TIMESTAMP
tar czvf /bk/nagios_dir_$TIMESTAMP.tar.gz /usr/local/nagiostar czvf /bk/pnp4nagios_dir_$TIMESTAMP.tar.gz /usr/local/pnp4nagios
Ignoring/Encouraging
System Warnings
Configuration Errors: Service Checks
Solution: Service Template Management
Service Template: Check Settings
Service Template: Alert Settings
Service Template: Add Hostgroup
Solution: Service Template Management
Max Concurrent Service Checks
Maximum Concurrent Checks
Edit nagios.cfg to avoid latency issues.
max_concurrent_checks=0
Mangling Users and Contacts
Managing Users and Contacts
Users (access to the web interface)
Contacts (notifications)
Creating Users: Web Interface
Creating Users: Web Interface
Creating Users: Restricted
Creating Users: Restricted
Managing Administrators: Full Access
Managing Administrators: Full Access
Core: cgi.cfg
authorized_for_system_information=nagiosadmin,john,sue,mark,tom,mary,ralph
authorized_for_configuration_information=nagiosadmin,john,sue,mark,tom,mary,ralphauthorized_for_system_commands=nagiosadmin,john,sue,mark,tom,mary,ralphauthorized_for_all_services=nagiosadmin,management,john,sue,mark,tom,mary,ralphauthorized_for_all_hosts=nagiosadmin,management,john,sue,mark,tom,mary,ralphauthorized_for_all_service_commands=nagiosadmin,john,sue,mark,tom,mary,ralphauthorized_for_all_host_commands=nagiosadmin,john,sue,mark,tom,mary,ralphauthorized_for_read_only=management
Contacts
Monitoring Non-Existent Ports
on Switches
Save Resources
Use AdminDown on Ports
* Administratively set unused ports as AdminDown
* Modify ifoperstatus
Turn Off Monitoring on Used Ports
Remove the Checks
Unused Switch Ports: Wasting Resources
* check port status
* check bandwidth
* send notifications
* ignore notifications
Modify check_ifoperstatus
Here is the code the affects output. You need to modify the
line:
if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'WARNING'; to$state = 'OK';
It is highlighted in the example.
## if ( not ($response->{$snmpIfAdminStatus} == 1) ) { $answer = "Interface $name (index $snmpkey) is administratively down."; if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'OK'; } elsif ( $adminWarn eq "i" ) { $state = 'OK'; } elsif ( $adminWarn eq "c" ) { $state = 'CRITICAL'; } else { # If wrong value for -a, say warning $state = 'WARNING'; }
Administratively Down Ports
Disable Port Checks
* 790 port checks disabled
* 1.5 GB of RAM saved
* 18% reduction in max service check execution time
Encouraging Non-Accountability for Changes
Who Makes Changes on Your Nagios?
Limit Admin Access
Require Training
Create Policy for Changes
Use a Test Server
Audit Log
Abusing Nagios XI Wizards
Wizard or Manual Creation: Assessment
Installation
Which method provides the most efficient installation?
Example: Using a wizard for a switch is most efficient.
Example: Manually creating a service check to be used on 100
servers is most efficient.
Visibility
Will it provide access to view the grouping of devices?
Example: Can effective reports be created from visible devices?
Management
Does it make management easier in the long run?
Example: The use of templates is an efficient method to manage
multiple devices that are similar.
Template Management
Disregarding Network Relationships
Reachability
Host: Manage Parents
Host: Manage Parents
Network Relationships: Parents
Importing Infectious Diseases
GUI Infection: Lack of Command Line Skills
Backups
* cron jobs
* manual backups
* verification
Analysis
* disk space
* logs
Troubleshooting
* finding stuff
* processes
* permissions
Edit Files
* learning vi or nano
Short Cut Infection: Auto-Discovery
Overestimating Human Intelligence
Some of the Things We Do as Humans Defies Logic
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level
Click to edit the title text format
2012
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level