rediris monitoring and operational procedures - · pdf filelicensed under gpl ( – the...

72

Click here to load reader

Upload: tranliem

Post on 19-Mar-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

1

RedIRIS – Alberto Escolano Sá[email protected]

RedIRIS monitoring and operational procedures

Page 2: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

2

Agenda

Part I: Monitoring•

Concepts•

SNMP•

Hardware•

Tools•

Active Monitoring

Page 3: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

3

Concepts

SNMP (Simple Network Management Protocol)–

RFC 1157–

Protocol developed to manage nodes of an IP network

UDP (User Datagram Protocol)–

RFC 768–

Most commonly used transport protocol for SNMP

SMI (Structure of Management Information)–

RFC 1155–

RFC 2578 (version 2)–

Contains the definitions for the structure and identification of management information for the Internet

Page 4: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

4

Concepts

MIB (Management Information Base)–

RFC 1156–

RFC 1213 (version 2)–

Together with SNMP and SMI provide the architecture for managing the Internet

OID (Object Identifier)–

List of numbers separated by points which specify an exact parameter

NMS (Network Management System)–

Set of applications that monitor and control managed devices

Can be standard or vendor specific

Page 5: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

5

Agenda

Part I: Monitoring•

Concepts•

SNMP•

Hardware•

Tools•

Active Monitoring

Page 6: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

6

SNMP

Protocol used to manage network devices such as switches, routers and servers

Components–

NMS: Software used to monitor and control managed devices

SNMP agent: Management software running in the managed device

Network device: Network node to be managed

SNMP uses the information provided by MIBs•

MIBs describe the structure of the management data of a network device in a hierarchical way using OIDs

OIDs identify variables or elements that can be read or written via SNMP

Network devices generate and send SNMP traps to the management system

Page 7: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

7

SNMP

SNMP versions–

SNMPv1: Basic operations and features–

Simplicity–

Lack of security–

RFC 1157–

SNMPv2: Additional operations and features–

Several versions (SNMPv2p, SNMPv2c, SNMPv2u, SNMPv2*)

Improved security–

Difficult choice between versions–

i.e: SNMPv2c –

RFC 1901–

SNMPv3: Security enhacement–

Uses features from several SNMPv2 versions–

Flexible way to define security methods and parameters

RFC 2570

Page 8: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

8

SNMP

SNMP architecture

NMS L2 Switch

L3 Router

SNMP Agent

SNMP Agent

SNMP Manager

MIBs

MIBs

MIBs

SNMP Request (UDP Port 161)

SNMP Response (UDP Port 161)

SNMP Request (UDP Port 161)

SNMP Response (UDP Port 161)

SNMP Trap (UDP Port 162)

SNMP Trap (UDP Port 162)

Page 9: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

9

SNMP

MIB Tree structure–

Each SNMP OID represents an individual object of the MIB

The MIB can be broken down into a tree structure where OIDs are leaves on the tree

root

ccitt (0) iso (1) joint-iso-ccitt (2)

standard (0) identified organization (3)

dod (6) …

internet (1)

private (4) security (5) snmpv2 (6)experimental (3)mgmt (2)directory (1)

mib-II (1) interface (2) …

Page 10: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

10

SNMP

First approach: How does all these things work?–

Query for inbound octets passed through an interface of a switch in the network

Let’s assume all the SNMP stuff is configured and running properly

We’ll need the MIB and OID for the SNMP query in the hierarchy of the OIDs tree

1.3.6.1.2.1.2 is the OID for the interfaces related data (

1.3.6.1.2.1.2.2.1.10 is the OID for the ifInOctets parameter value

Now we need the interface index to refer to it. Let’s assume it is 65.

The full OID is 1.3.6.1.2.1.2.2.1.10.65–

OID translation:–

.iso.org.dod.internet.mgmt.mib-

2.interfaces.ifTable.ifEntry.ifInOctets.65

Page 11: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

11

SNMP

Second approach: Numeric OID conversion–

1.3.6.1.2.1.2.2.1.10.65 is converted using IF-MIB–

IF-MIB partially detailed:IF-MIB DEFINITIONS ::= BEGINIMPORTS

MODULE-IDENTITY, OBJECT-TYPE, Counter32, Gauge32, Counter64,Integer32, TimeTicks, mib-2,NOTIFICATION-TYPE FROM SNMPv2-SMI…

ifMIB MODULE-IDENTITYLAST-UPDATED "200006140000Z"ORGANIZATION "IETF Interfaces MIB Working Group"CONTACT-INFO…

ifEntry OBJECT-TYPESYNTAX IfEntryMAX-ACCESS not-accessibleSTATUS currentDESCRIPTION

"An entry containing management information applicable to aparticular interface."

INDEX { ifIndex }::= { ifTable 1 }

Page 12: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

12

SNMP

IF-MIB partially detailed (cont.):IfEntry ::=

SEQUENCE {ifIndex InterfaceIndex,ifDescr DisplayString,ifType IANAifType,ifMtu Integer32,ifSpeed Gauge32,ifPhysAddress PhysAddress,ifAdminStatus INTEGER,ifOperStatus INTEGER,ifLastChange TimeTicks,ifInOctets Counter32,ifInUcastPkts Counter32,

…ifInOctets OBJECT-TYPE

SYNTAX Counter32MAX-ACCESS read-onlySTATUS currentDESCRIPTION

"The total number of octets received on the interface,including framing characters.Discontinuities in the value of this counter can occur atre-initialization of the management system, and at othertimes as indicated by the value ofifCounterDiscontinuityTime."

::= { ifEntry 10 }

Page 13: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

13

SNMP

Result of the SNMP query–

The OID has a Counter32 variable, so the result of the query is a 32 bits value stored in that variable

i.e.: Real query done to a Cisco switch:–

.1.3.6.1.2.1.2.2.1.10.65 = Counter32: 36307165–

That result translated into text using IF-MIB–

.iso.org.dod.internet.mgmt.mib-

2.interfaces.ifTable.ifEntry.ifInOctets.65 = Counter32: 36307165

Conclusion of the results obtained–

The inbound octects that have passed through the Interface Index 65 of the network equipment queried are 36307165 total octets at the time queried

For having results in bps, queries must be polled in time and calculate delta value between samples

Page 14: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

14

Agenda

Part I: Monitoring•

Concepts•

SNMP•

Hardware•

Tools•

Active Monitoring

Page 15: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

15

Hardware

The hardware involved in SNMP monitoring are all the network equipment and servers

RedIRIS core network–

Layer 2 switches–

Nortel MERS 8610–

Cisco Catalyst 6500–

Layer 3 routers–

Juniper T-320, M-320–

Juniper MX-480, MX-960–

Juniper M120, M40e, M20, M10i•

RedIRIS access network–

Layer 2 switches–

Juniper EX-4200–

Cisco Catalyst 2960–

Layer 3 routers–

Juniper M7i•

RedIRIS servers–

Red Hat Linux Enterprise 4.x and 5.x–

Solaris 8 and Solaris 10

Page 16: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

16

Hardware

SNMP configuration–

Network equipment (L2, L3)–

General config parameters–

SNMP version–

SNMP communities (RO, RW)–

SNMP clients–

TRAPs to send to the SNMP manager–

Source address to bind TRAP packets–

Location and contact details–

TRAP details–

Vendor specific–

Vendor MIBs in SNMP manager–

Categories–

Authentication–

Chassis–

Link–

VLANs–

Configuration–

Routing–

STP–

Page 17: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

17

Hardware

SNMP configuration–

Cisco IOS–

Parameters configured globallysnmp-server community public ROsnmp-server community private RWsnmp-server trap-source Vlan40snmp-server location RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez

Moreno, s/n, 28020-Madrid snmp-server contact RedIRIS NOC; +34 91 2127620; <[email protected]> snmp-server enable traps snmp authentication linkdown linkup coldstart

warmstartsnmp-server enable traps vlancreatesnmp-server enable traps vlandeletesnmp-server enable traps configsnmp-server enable traps bridge newroot topologychangesnmp-server enable traps syslogsnmp-server host 130.206.1.39 version 2c communitysnmp-server tftp-server-list 80snmp-server chassis-id number

Page 18: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

18

Hardware

SNMP configuration–

Juniper JUNOS–

Configured in snmp dedicated module of the configurationsnmp {

location "Centro de Gestion de RedIRIS, C/ Serrano 142 (28006-Madrid)";contact "RedIRIS NOC; +34 912127620; +34 629148201; <[email protected]>";community <community> {

authorization read-only;clients {

130.206.1.39/32;130.206.1.40/32;

}}trap-options {

source-address lo0;}/* Notifications */trap-group <trap-group-name>{

version v2;categories {

authentication;chassis;link;remote-operations;routing;startup;rmon-alarm;

}targets {

130.206.1.39;}

}}

Page 19: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

19

Hardware

SNMP configuration–

Servers (Solaris, Linux)–

SNMP manager used in RedIRIS (NET-SNMP)–

Both client and server features–

Used for Solaris and Linux systems–

Available for free (http://www.net-snmp.org/)–

SNMP config files–

/etc/snmp/snmpd.conf–

SNMP daemon config file–

Listening UDP port 161#ACLcom2sec local 127.0.0.1/32 <community>com2sec myLAN192.168.1.0/24 <community>

#ACL assignment for RW and RO groupsgroup MyRWGroup v1 localgroup MyRWGroup v2c localgroup MyROGroup v1 myLANgroup MyROGroup v2c myLAN

# MIB tree to be queried## name incl/excl subtree mask(optional)view all included .1 80

#group context sec.model sec.level prefix read write notifaccess MyROGroup "" any noauth exact all none noneaccess MyRWGroup "" any noauth exact all all all

# Contact Informationsyslocation RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez Moreno, s/n, 28020-Madrid syscontact RedIRIS NOC; +34 91 2127620; [email protected]

Page 20: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

20

Hardware

SNMP configuration–

Servers (Solaris, Linux)–

SNMP manager used in RedIRIS (NET-SNMP)–

Both client and server features–

Used for Solaris and Linux systems–

Available for free (http://www.net-snmp.org/)–

SNMP config files–

/etc/snmp/snmptrapd.conf–

TRAP receiver daemon config file–

Listening UDP port 162# --== SONET/SDH Alamrs ==--traphandle JUNIPER-SONET-MIB::jnxSonetAlarmSet /usr/local/bin/traptoemail -s chico.rediris.es -f monitor-

[email protected] [email protected] JUNIPER-SONET-MIB::jnxSonetAlarmCleared /usr/local/bin/traptoemail -s chico.rediris.es -f monitor-

[email protected] [email protected]# --== Links ==--traphandle IF-MIB::linkUp /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected] IF-MIB::linkDown /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected]

[email protected]# --== BGP ==--traphandle BGP4-MIB::bgpEstablished /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected]

[email protected] BGP4-MIB::bgpBackwardTransition /usr/local/bin/traptoemail -s chico.rediris.es -f monitor-

[email protected] [email protected]

Traphandle is used to execute a script (traptoemail)–

Traptoemail is a script that processes traps and send them user-friendly via e-mail to RedIRIS NOC

Page 21: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

21

Hardware

SNMP configuration–

Servers (Solaris, Linux)–

SNMP daemons–

/etc/init.d/snmpd–

/etc/init.d/snmptrapd–

Launching options–

start–

status (for snmpd)–

stop–

restart–

reload–

Options in daemon:–

OPTIONS="-c /etc/snmp/snmptrapd.conf -o /var/log/snmptrap.log -u /var/run/snmptrapd.pid -M /usr/local/share/snmp/mibs/ -m ALL”

This will take snmptrapd.conf as config file for the daemon, will generate snmptrapd.log and snmptrapd.pid files and will load ALL MIBs on the machine in the defined path

Page 22: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

22

Agenda

Part I: Monitoring•

Concepts•

SNMP•

Hardware•

Tools•

Active Monitoring

Page 23: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

23

Tools

trap2email–

Perl script combined with SNMP trap handler used to convert SNMP traps to e-mail messages

Should be launched as an extension of snmptrapd, not as a regular user–

Options–

-s smtpserver–

-f fromaddress–

toaddress–

traphandle IF-MIB::linkUp /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

Line in /etc/snmp/snmptrapd.conf file–

ResultsHost: EB-Santiago0 (130.206.204.254)

SNMPv2-MIB::sysUpTime.0 112:4:13:18.95SNMPv2-MIB::snmpTrapOID.0 IF-MIB::linkUp

IF-MIB::ifIndex.121 121IF-MIB::ifAdminStatus.121 upIF-MIB::ifOperStatus.121 up

IF-MIB::ifName.121 so-3/0/0SNMPv2-MIB::snmpTrapEnterprise.0 JUNIPER-CHASSIS-DEFINES-MIB::jnxProductNameM40e

Interfaz: so-3/0/0

Descripcion del interfaz: --

Conexion RedIRIS-FCCN I -

Num. Adm. 1530000-1022512

Page 24: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

24

Tools

MRTG (The Multi Router Traffic Grapher)–

Tool written in Perl downloadable for free from MRTG main web-site licensed under GPL (http://oss.oetiker.ch/mrtg/)

The tool uses SNMP to query network devices and gets information

from them

The results of the queries are stored (log or RRD)–

Those files are processed and included in a HTML file with PNG graphs–

RedIRIS use RRD (Round Robin Database) format to store data collected–

Example of graph generated with MRTG and RRD data

Page 25: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

25

Tools

MRTG basic components–

mrtg: main program–

cfgmaker: script used to generate .cfg files needed for the main program to generate graphs

RRDtool: if required. In RedIRIS RRD is used so RRDtool is needed and information is stored in RRD database format

RRDtool is a free opensource tool licensed under GPL–

Downloadable (http://oss.oetiker.ch/rrdtool/)•

MRTG configuration–

MRTG needs .cfg files to generate HTML web pages where information is displayed

cfgmaker [options] [community@]router [[options] [community@]router ...]

Some options available:–

--ifref=nr interface references by Interface Number (default)–

--ifref=ip ... by Ip Address–

--ifref=eth ... by Ethernet Number–

--ifref=descr ... by Interface Description–

--ifref=name ... by Interface Name–

--ifref=type ... by Interface Type–

--ifdesc=nr interface description uses Interface Number (default)–

--ifdesc=ip ... uses Ip Address–

--ifdesc=descr ... uses Interface Description–

--ifdesc=name ... uses Interface Name–

--ifdesc=alias ... uses Interface Alias–

--ifdesc=type ... uses Interface Type

Page 26: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

26

Tools

MRTG configuration–

Command used in RedIRIS–

./cfgmaker --global "HtmlDir: /home/mrtg/datos/GAL/html" --global "ImageDir: /home/mrtg/datos/GAL/html/image" --global "LogDir: /home/mrtg/datos/GAL/html/log" --global "LogFormat: rrdtool" --global "PathAdd: /usr/bin/" --global "Options[_]: growright, bits" --snmp-

options=:::::2 <community>@eb-santiago0

HtmlDir: /home/mrtg/datos/GAL/htmlImageDir: /home/mrtg/datos/GAL/html/imagesLogDir: /home/mrtg/datos/GAL/html/logLogFormat: rrdtoolPathAdd:/usr/bin/#WorkDir:/home/noc/mrtg/html/GALRefresh:300Language: SpanishForks: 4 RunAsDaemon:YesInterval:5Background[_]: #e8e7dc#---------------------------------------------------------------YLegend[cesga]: Bits por segundoOptions[cesga]: growright, bitsTarget[cesga]: /130.206.204.21:<community>@eb-santiago0.rediris.es:::::2MaxBytes[cesga]: 312500000Title[cesga]: Línea de acceso CESGAPageTop[cesga]: <TABLE>

<TR><TD>Línea:</TD><TD>GigabitEthernet 1000 Mbps</TD></TR><TR><TD>Sistema:</TD><TD>EB-Santiago0</TD></TR><TR><TD>Administrador:</TD><TD>NOC de RedIRIS; +34-91 212 76 20/25; <[email protected]></TD></TR>

</TABLE>#---------------------------------------------------------------

Page 27: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

27

Tools

MRTG results

Page 28: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

28

Tools

MRTG organization in RedIRIS–

Each RedIRIS Node has an unique cfg file–

MRTG statistics divided in several groups–

RedIRIS10 links–

External links–

Multicast statistics–

BGP peerings–

Monthly statistics–

Yearly statistics–

RedIRIS Central Services–

Special Projects links–

Access statistics–

Alphabetically ordered by Institution

Page 29: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

29

Tools

Wheathermap–

Combination of several files to generate the map–

SVG map for output–

XML file with the status of the network–

PNG files to display in a web page

Page 30: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

30

Tools

Nagios–

Open Source monitoring tool licensed under GPL–

Free downloadable (http://www.nagios.org/)–

Prerequisites needed to install the tool–

HTTP server (Apache)–

GCC compiler to build the binaries from source–

GD development libraries–

In fedora Linux for example all packages can be installed with yum

yum install httpdyum install gccyum install glibc glibc-commonyum install gd gd-devel

Download and install Nagios and Nagios Plugins–

Nagios Plugins are needed to check the status of hosts and services–

HTTP, POP3, FTP, SSH, NTP…–

CPU Load, Disk Usage, Memory Usage, Users…–

Servers and Hosts (Unix/Linux, Windows)–

Routers, Switches–

Page 31: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

31

Tools

Nagios configuration–

Main Configuration File–

/usr/local/nagios/etc/nagios.cfg–

File read by daemon and CGIs–

Default file OK for starting–

Resource Files–

Used to store user defined macros–

Referenced in nagios.cfg–

Object Definition Files–

Used to define hosts, services and everything to be monitored

Used to define HOW hosts are monitored

Referenced in nagios.cfg–

CGI Configuration File–

Used to define directives that affect the operation of CGIs

Referenced in nagios.cfg

Page 32: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

32

Tools

Nagios configuration examples–

Main Configuration File –

nagios.cfg–

Default file after installing is OK for starting with the tool–

Resource Files–

Optional and useful to store usernames, passwords of paths–

See resource.cfg file in the sample-config directory of the Nagios installation package

Object Definition Files–

Defined in nagios cfg: cfg_file=<file_name>cfg_file=/usr/local/nagios/etc/hosts.cfgcfg_file=/usr/local/nagios/etc/services.cfgcfg_file=/usr/local/nagios/etc/commands.cfg

Example hosts.cfg filedefine host{use generic-hosthost_name chico.rediris.esalias ChicoAddress 130.206.1.3check_command check-host-alivemax_check_attempts 10notification_interval 120notification_period 24×7notification_options d,u,r}

CGI Configuration File–

cgi.cfg file located in the config directoryauthorized_for_system_information=nagiosadminauthorized_for_configuration_information=nagiosadminauthorized_for_system_commands=nagiosadmin

Page 33: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

33

Tools

Nagios running

Page 34: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

34

Tools

Nagios running

Page 35: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

35

Tools

Nagios running

Page 36: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

36

Tools

Nagios running

Page 37: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

37

Tools

Nagios running

Page 38: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

38

Tools

NagVis–

NagVis is a visualization addon for Nagios–

Free GPL software (http://www.nagvis.org/)–

Objects placed in maps updated periodically–

Maps organized:–

geographically–

physicallly–

Logically–

By processes–

NagVis collects the information from backends–

Default backend delivered with NagVis: NDO (Nagios Data Out) MySQL Backend

All objects from Nagios can be added to NagVis–

Each map has its own configuration file

Page 39: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

39

Tools

NagVis deployment in RedIRIS

Page 40: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

40

Tools

NagVis deployment in RedIRIS

Page 41: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

41

Tools

NagVis deployment in RedIRIS

Page 42: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

42

Tools

NagVis deployment in RedIRIS

Page 43: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

43

Agenda

Part I: Monitoring•

Concepts•

SNMP•

Hardware•

Tools•

Active Monitoring

Page 44: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

44

Active Monitoring

Until now all monitoring issues covered are passive monitoring related

Passive monitoring is considered when devices are periodically polled to collect data

Active Monitoring –

What is?–

Active requires “action”–

Active monitoring is considered when injecting packets in the network to make tests and get results

Throughput–

Delay•

Active Monitoring –

How to do it?–

In RedIRIS we are actually deploying perfSONAR (PERFormance Service Oriented Network monitoring ARchitecture )

Information and downloading (http://www.perfsonar.net/)–

DANTE vs Internet2 version–

JAVA vs Perl

Page 45: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

45

Active Monitoring

perfSONAR components–

Client / Server application–

Client-side -

perfSONAR UI (User Interface)–

Server-side–

1 Linux box for throughput measurements (BWCTL)–

1 Linux box for delay measurements (OWAMP)–

Server installation–

Red Hat Enterprise Linux 5.3 recomended–

May run in any Linux distribution–

RedIRIS tested in CentOS Linux 5.3–

Set of tools available in RPM binaries and TGZ sources–

Some dependencies not resolved–

It’s not expensive but hard to deploy–

Client installation–

JAVA graphical client multi-platform available

Page 46: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

46

Active Monitoring

perfSONAR UI in action

Page 47: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

47

Active Monitoring

perfSONAR services–

Measurement Point Service–

It creates and/or publish monitoring information related to active or passive measurements

Measuremente Archive Service–

It stores and publish received information from Measurement Point Services

Transformation Service–

It provides the capability to manipulate the stored data of the measurements performed

Lookup Service–

Used to discover services and other LS–

Topology Service–

Allows the information of network topology is available to other

services

Finds closest MP–

Provides information of network topology to the visualization tools–

Authentication Service–

Controls access to services

Page 48: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

48

Active Monitoring

perfSONAR services–

Measurement Point Service–

It creates and/or publish monitoring information related to active or passive measurements

Measuremente Archive Service–

It stores and publish received information from Measurement Point Services

Transformation Service–

It provides the capability to manipulate the stored data of the measurements performed

Lookup Service–

Used to discover services and other LS–

Topology Service–

Allows the information of network topology is available to other

services

Finds closest MP–

Provides information of network topology to the visualization tools–

Authentication Service–

Controls access to services

Page 49: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

49

Active Monitoring

Client

Network A Network B

LS A LS BMA A MA B

a bc d

e f

¿Link utilization –

IPs a,b,c?a,b,c : Net A, MA A

Get link abc utilization

Response

GraphgLS¿Where get info from Networks A and B?

LS A, LS B

perfSONAR Client interaction

Page 50: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

50

Active Monitoring

perfSONAR tools–

OWAMP (One Way Active Measurement Protocol)–

Daemon that runs one-way latency tests–

Provides:–

More accurate picture of the performance degradation (direction of degradation, is more sensitive to jitter)

Vision of the routing (hops, one-way latency)–

Availability Information–

Temporal reference about problems

BWCTL (BandWidth test ConTroLler)–

Daemon that runs iperf tests with multiple instances support–

Provides:–

Troubleshooting tool because it makes use of the network the same way as a user wouldArchivado de pruebas realizadas con límite de tráfico alcanzado

More tools

Page 51: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

Active Monitoring

Spanish LHC architecture

Page 52: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

Active Monitoring

perfSONAR web-services (LS web admin interface)

Page 53: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

Active Monitoring

perfSONAR web-services (LS Basic Configuration)

Page 54: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

54

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 55: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

55

Organization

RedIRIS NOC is structured in levels–

Level 1–

Initial response team–

Monitoring network devices in real time–

Answering ops mailbox and level 1 queue–

Answering customer phone calls–

First approach to solve problems–

Dealing with carriers directly–

External company support–

Level 2–

Second level response team–

Answering noc mailbox and level 2 queue–

Supporting more complex network problems–

Dealing with vendors–

RedIRIS people–

External company support

Page 56: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

56

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 57: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

57

Incidents

Incidents reported in several ways–

Tickets tool–

Web interface tool where all incidents are queued–

Main level 1 and level 2 team support tool–

e-mail–

RedIRIS ops and noc mailboxes–

Customers suppport mailboxes–

Network devices problems reports–

Telephone–

Customers also contact level 1 by phone–

Monitoring tools–

All the monitoring platform reports indicents in the network–

Level 1 continue checking monitoring tools–

Logs–

All the machines logs are stored and processed when problems are detected

Page 58: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

58

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 59: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

59

Maintenance works

Different possibilities–

Network operator programmed work–

15 previous days notification–

RedIRIS aceptation–

RedIRIS programmed work–

Engineering tasks–

Maintenance tasks–

New service configuration–

Non-programmed works–

Due to unexpected problems–

Network links (fiber cuts, etc.)–

Network equipment (hardware problems)•

Ticket system notification for all Institutions connected to RedIRIS–

Web based tool used to notify and update information about network problems

Notifications via e-mail

Page 60: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

60

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 61: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

61

24x7

External company

24x7x365 monitoring–

Support

when

RedIRIS people

not

in the

office–

Procedures

to

monitor all

RedIRIS equipment–

Procedures

to

open/close

RMAs–

Hardware replacement

procedures

established–

Network operator

and

hardware vendors

interaction•

They

can also

do in the

equipment–

Execute

“show”

commands

for

monitoring–

Receive

SNMP trap

notifications–

Console

login

for

Hardware replacements•

They

can NOT do in the

equipment–

Execute

“config”

commands–

Modify

running

configuration–

Configure new

services

Page 62: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

62

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 63: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

63

SLAs

Network Operators

SLA–

Maintanence

works

MUST be 15 previous

days

notified–

If

this

is

not

done then

a penalty is

applied–

The

links stability

and

quality

must

be guaranteed–

No degradation–

No outages–

There

is

a penalty for

link failures

greater

than

10 secs–

There

is

a maximum

incident

response time established–

Incremental penalty to

several

failures

of

the

same

link•

External company

SLA–

Dedicated

people

guaranteed–

Maximum

incident

response time–

Hardware stockage

available•

Hardware vendor

SLA–

4 hour

hardware replacement

guaranteed–

Engineering

support

Page 64: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

64

Agenda

Part II: Operational Procedures•

Organization•

Incidents•

Maintenance works•

24x7•

SLAs•

Procedure

Page 65: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

65

Procedure

Incidents

reported

via

Trouble

Ticket tool�

Page 66: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

66

Procedure

Web or

e-mail managed

incidents

Page 67: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

67

Procedure

New

ticket creation

Also

can be done by e-mail

Page 68: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

68

Procedure

All

new

incidents

are included

in the

Trouble

Ticket system–

e-mail notifications–

phone

calls–

Incidents

reported

by monitoring

tools–

New

service

deployment•

All

incidents

are stored

in a MySQL database–

Reports–

Statistics–

Tracing•

Level

1 to

Level

2 escalating

Page 69: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

69

Procedure

Network outages

notifications–

Same

tool

used

Page 70: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

70

Procedure

Results

Network tickets opened

Page 71: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

71

Procedure

Results

Network ticket tracing

Page 72: RedIRIS monitoring and operational procedures -  · PDF filelicensed under GPL (  – The tool uses SNMP to query network devices and gets information from them

72

Questions ?