noc services and applications1 afnog 2001 brian longwe *some slides based on the netmgt talks in ntw...

43
NOC Services and Applicat ions 1 NOC Services and Applications AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

Upload: allyson-caldwell

Post on 27-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 1

NOC Services and Applications

AFNOG 2001

Brian Longwe

*some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

Page 2: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 2

What is a NOC?

Network Operations Centre Monitors and manages a service provider’s

network• Fault monitoring and management • Network status and operational statistics• Information about current, historical and planned

availability of systems• Engineers can coordinate their work through the

NOC

Page 3: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 3

Network Management - What is it?

“In order operate a reliable service, the network must be managed according to a determined discipline, using a coherent structure of information management.”

Geoff Huston, ISP Survival Guide

Page 4: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 4

Network Management - Components

Parts of Network Management

• Fault management • Configuration/Change management• Performance management• Security management• Accounting management

Page 5: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 5

Fault Management

Identify the fault• Regular polling of network elements

Isolate the fault• Diagnosis of the network components

Respond to the fault• Allocate resources to resolve the fault• Priority scheduling• Technical/management escalation

Resolve the fault• notification

Page 6: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 6

Fault Management - systems

reporting mechanism• link to NOC• notify on-call personnel

setup & control alarm procedures repair/recovery procedures ticket system

Page 7: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 7

Fault Management - Fault Detection

Who notices a problem with the network?• Network Operations Center w/ 24x7 operations staff

– open trouble ticket to track problem– preliminary troubleshooting– Assign engineer to problem or escalate ticket status

• Customer call• Other ISPs

Page 8: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 8

Fault Management - Fault Detection (con)How can you tell if there is a problem with the

network?• Network Monitoring Tools

– common utilities ping traceroute Snmp

– Monitoring Systems NOCol Big Brother NetSaint NMIS HP Openview, etc…

• Report state or unreachability– detect node down– routing problems

Page 9: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 9

Fault Management - Ticket System

Very Important! Need mechanism to track:

• failures• current status of outage• carrier tickets

Page 10: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 10

Fault Management:Ticket System

system provides for:• short term memory & communication• scheduling and work assignment• referrals and dispatching• oversight• statistical analysis• long term accountability

Page 11: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 11

Fault Management - Ticket Usage

create a ticket on ALL calls create a ticket on ALL problems create a ticket for ALL scheduled events copy of ticket mailed to reporter and mailing

list(s) all milestones in resolution of problem maintain

the same ticket # ticket stays "open" until problem resolved

according to problem reporter

Page 12: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 12

Fault Management - Ticket Example

Sample opening ticketSubject Serial Number Fix sshd on T1 instructor machines 6

Area Queue none afnog-noc

Requestors Owner [email protected] inst

Status Last User Contact resolved Mon May 7 17:02:21 2001 (30 hr ago)

Current Priority Final Priority 1 1 Due No date assigned

Last Action Mon May 7 17:02:21 2001 (30 hr ago) Created Sat May 5 17:08:08 2001 (3 day ago)

Page 13: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 13

Fault Management - Ticket Example

Sample progress ticket

TT0000033975 has been MODIFIED. Here are the fields that have been changed:

CopyOfTime : 5TTC Temp : 0Ticket information log : [email protected] said ...

While I was investigating this, Debbie from UUNet called (via Merit main number) to tell us they were seeing it down. She can be reached at xxx-xxxx. The UUNet ticket is xxxxx..

Page 14: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 14

Fault Management - Ticket Example

Sample closing ticket• includes previous ticket contents plus resolution

Users on the laptop station minihub are not getting correct DHCP responses. No gateway or DNS entries are returned. Thanks, - Hervey

-- CUSTOMER INFORMATION --------------------- 'inst' (AFNOG Instructors) –

-------------------------------------There have been several issues. First, the Cisco config-switch was set so the box would forget it's config on a power cycle (and we've had a few). Second, I made a typo when I cleaned up a DNS file. Things *should* be working now (famous last words). Resolving this till I hear otherwise. GJ ---------------------------------------------------------------->otherwise. >GJ Many thanks! - Hervey

Page 15: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 15

Fault Management - typical failures

• Node unpingable• no ip connectivity to router• possible reasons:

– serial link downcall telco

– router down/hardware problemcall engineer

– routing problem troubleshoot with tracerouterouteviews machine

Page 16: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 16

Performance Management

A Consistent level of network performance Data collection

– interface stats– throughput– error rates– usage– percent availability

Data analysis for performance metrics and trends

Establishment of performance thresholds Capacity planning and deployment

Page 17: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 17

Importance of Network Statistics

Accounting Troubleshooting Long-term trend analysis Capacity Planning Two different types

• active measurement• passive measurement

Management Tools have statistical functionality

Page 18: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 18

Performance Management Tools

netflow• cflowd (http://www.caida.org/Tools/Cflowd)• collects flow information from cisco routers• AS to AS information• src and destination ip and port information• useful for accounting and statistics• how much of my traffic is port 80?• how much of my traffic goes to AS237?

Page 19: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 19

Netflow examples

Top ten lists (or top five) ##### Top 5 AS's based on number of bytes #######srcAS dstAS pkts bytes 6461 237 4473872 3808572766 237 237 22977795 3180337999 3549 237 6457673 2816009078 2548 237 5215912 2457515319

##### Top 5 Nets based on number of bytes ######Net Matrix----------number of net entries: 931777 SRCNET/MASK DSTNET/MASK PKTS BYTES 165.123.0.0/16 35.8.0.0/13 745858 1036296098 207.126.96.0/19 198.108.98.0/24 708205 907577874 206.183.224.0/19 198.108.16.0/22 740218 861538792 35.8.0.0/13 128.32.0.0/16 671980 467274801 ##### Top 10 Ports ####### input outputport packets bytes packets bytes119 10863322 2808194019 5712783 42730455680 36073210 862839291 17312202 138781709420 1079075 1100961902 614910 627542687648 1146864 419882753 1147081 41466321225 1532439 97294492 2158042 722584770

Page 20: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 20

Security Management

Dont’ leave things that are likely to be interesting to mice lying on the kitchen table overnight

Plug the holes that mouse are using to get into the house Don’t provide places within the house for mice to build nests Set traps along walls where you often see mice out of the corner

of your eye Check the traps daily to rebait them and to dispose of squashed

mice. Full traps don’t catch mice, and they smell Avoid using commercial bait-and-kill poisons. Traditional snap

traps are best. Get a cat!

Page 21: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 21

Security Management - Tools

security tools• cops - host configuration checker (www.cert.org)• swatch - email reports of activity on machine• Tcpwrappers – log connections, restrict access• ssh/skey – crypto authentication and communications• Tripwire – monitor changes to system files

Keep up to date with security information• bug reports

– CERT advisories mailing list: http://www.cert.org./contact_cert/certmaillist.html

• bug fixes• intruder alerts

Page 22: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 22

Security Management – Good Practice

reporting procedure for security events• e.g. break-ins• abuse email address for customers to report

complaints ([email protected]) control internal and external gateways

• control firewalls (external and internal) security log management

• centralised logging host

Page 23: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 23

Configuration Management

Maintaining information relating to the design of the network and its current configuration

Monitor Network State• Record of network topology

– Static what is deployed where it is deployed how it is attached

– Dynamic operational status of the network elements

Page 24: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 24

Configuration Management

nnhvd

husc6

harvard

geo

oitgw1

mghgw

sphgw1

wjhgw1

wjh12

generali

talcott

harvisr

huelings

pitirium

nngw

lmagw1

dfch tch tch

SNMP driven display

Page 25: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 25

Configuration Management

Operational Control of network Start/stop individual components Alter configuration of devices Load and save config versions Hardware/Software upgrades Methods of access

• SNMPGet / SNMPSet• Out-of-Band access

Page 26: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 26

Configuration Management

inventory management• database of network elements• history of changes & problems

directory maintenance• all hosts & applications• nameserver database

host and service naming coordination• "Information is not information if you can't find it"

Page 27: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 27

What is SNMP?

Simple Network Management Protocol query - response system

• can obtain status from a device• standard queries• enterprise specific

uses database defined in MIB• management information base

Page 28: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 28

What do we use SNMP for?

query routers for:• in and out bytes per second• CPU load• uptime• BGP peer session status

query hosts for:• network status• Message queues• Web traffic• Squid proxy load

Page 29: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 29

SNMP Network Management Tools MRTG http://www.ee-staff.ethz.ch/~oetiker/webtools/mrtg/

RRDtool http://ee-staff.ethz.ch/~oetiker/webtools/

rrdtool/ Cricket http://cricket.sourceforge.net/

HP OPenview Benefits

– simple to use and configure– quickly determine spikes/drops in traffic– Can display almost any data that can be collected via

SNMP

Page 30: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 30

MRTGTraffic Analysis for Hssi1/0/0

System: msu.mich.net in Maintainer: Interface: Hssi1/0/0 (2) IP: hssi1-0-0.msu.mich.net (198.108.22.102) Max Speed: 5630.6 kBytes/s (propPointToPointSerial)

Page 31: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 31

Accounting Management

What do you account for?• Use of the network and the services it provides

Types of accounting data• RADIUS/TACACS accounting data from Access

servers• Interface statistics• Protocol statistics

Accounting Data affects Business Models• Bill on usage?• Flat-rate billing?

Page 32: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 32

NOC Practical

network monitor - NOCOL Observe network status

• Create a “problem”• Observe change in status• “resolve” the problem

Statistics?

Page 33: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 33

NOC Practical

Ticket System - WebRT• Overview• Create tickets

– As customer

– As engineer

• Review tickets as engineer• Take/Assign tickets

Page 34: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 34

Exercises

Rows A to I become the NOC Rows B to J become the customers Customers send in fault notifications,

automatically creating tickets Engineers take/give tickets and resolve or

escalate Changeover … repeat <during this, there are network failures that must

be detected and fixed>

Page 35: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 35

Exercise

B D F H J

A C E G I

Customers

NOC

Ticket Flow

•Create tickets by sending in email to [email protected]

•Receive updates on progress of ticket status

•Receive notice that ticket has been closed when resolution is complete

•Use Ticket System web interface http://noc.ws.afnog.org/cgi-bin/webrt.cgi

•Assign tickets

•Update tickets

•Escalate tickets

•Resolve tickets

B

First Level 2nd Tier: Monitoring,

Page 36: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 36

How do I manage my network?

Which tools should I use? What do I really need?• Keep it simple!• Need to consider engineers working remotely• Don’t want to spend too much time maintaining the

tool (it should be helping you!)• Different tools for NOC and engineers• Different tools for statistics• RELIABILITY!

Page 37: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 37

References http://www.merit.edu/ipma/docs/isp.html http://www.nanog.org http://www.caida.org http://www.nlanr.net http://www.cisco.com http://www.amazing.com/internet/ http://www.isp-resource.com/ http://www.merit.edu/ipma http://www.ripe.net

Page 38: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 38

More Tools!

http://www.caida.org/Tools/• OC3Mon/Coral

http://www.merit.edu/~ipma• RouteTracker• IRRj• ASExplorer

http://www.geektools.com/ http://www.merit.edu/ipma/tools/other.html

Page 39: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 39

ASexplorer

Page 40: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 40

Route Flap Stats

Page 41: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 41

Looking Glass Tools

route-views.oregon-ix.net>show ip bgp 35.0.0.0BGP routing table entry for 35.0.0.0/8, version 56135569Paths: (17 available, best #12) 11537 237 198.32.8.252 from 198.32.8.252 Origin incomplete, localpref 100, valid, external Community: 11537:900 11537:950 2914 5696 237 129.250.0.3 (inaccessible) from 129.250.0.3 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 2914 5696 237 129.250.0.1 (inaccessible) from 129.250.0.1 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 3561 237 237 237 204.70.4.89 from 204.70.4.89 Origin IGP, localpref 100, valid, external 267 1225 237 204.42.253.253 from 204.42.253.253 Origin IGP, localpref 100, valid, external Community: 267:1225 1225:237

http://www.merit.edu/~ipma/tools/lookingglass.html

Page 42: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 42

More Looking Glass Tools

Traceroute servers http://www.merit.edu/ipma/tools/trace.html

Query: trace Addr: www.isoc.org

Translating "www.isoc.org"...domain server (206.205.242.132) [OK]

Type escape sequence to abort.Tracing the route to info.isoc.org (198.6.250.9)

1 iad1-core2-fa5-0-0.atlas.digex.net (165.117.129.2) 0 msec 0 msec 4 msec 2 dca5-core2-s5-0-0.atlas.digex.net (165.117.53.41) 0 msec 4 msec 0 msec 3 dca5-core1-fa5-1-0.atlas.digex.net (165.117.56.117) 4 msec 0 msec 4 msec 4 Hssi3-1-0.BR1.DCA1.ALTER.NET (209.116.159.98) 0 msec 0 msec 4 msec 5 101.ATM2-0.XR1.DCA1.ALTER.NET (146.188.160.226) [AS 701] 4 msec 0 msec 4 msec 6 195.ATM7-0.XR1.TCO1.ALTER.NET (146.188.160.102) [AS 701] 4 msec 0 msec 0 msec 7 193.ATM8-0-0.GW1.TCO1.ALTER.NET (146.188.160.33) [AS 701] 4 msec 4 msec 4 msec 8 charlie.isoc.org (198.6.250.1) [AS 701] 8 msec 8 msec 8 msec 9 info.isoc.org (198.6.250.9) [AS 701] 8 msec * 12 msec

Page 43: NOC Services and Applications1 AFNOG 2001 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner

NOC Services and Applications 43

SNMP Tool references

• MON - http://www.kernel.org/software/mon/• NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz • Sysmon - ftp://puck.nether.net/pub/jared • Rover - http://www.merit.edu/~rover• Concord - http://www.concord.com• http://www.merit.net/~netscarf