ibm tivoli network manager ip edition 3.9 best practices v1.0

48
IBM Tivoli Network Manager IP Edition V3.9 Best Practices Licensed Materials - Property of IBM

Upload: mikael-olsson

Post on 21-Apr-2015

1.897 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

IBM Tivoli Network Manager IP Edition V3.9

Best Practices

Licensed Materials - Property of IBM

Page 2: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Note: Before using this information and the product it supports, read the information in “Notices” located at the end of this document.

© Copyright IBM Corporation 2011, 2012.

US Government Users Restricted Rights-Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Page 3: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Contents About this publication ...................................................................................................................... 5

Intended audience .......................................................................................................................................... 5 What this publication contains ..................................................................................................................... 5 Conventions used in this publication........................................................................................................... 5 Related publications ....................................................................................................................................... 6

Chapter 1 Introduction...................................................................................................................... 8

Chapter 2 Discovery ........................................................................................................................ 10 Overview........................................................................................................................................................ 10 How the discovery process works.............................................................................................................. 11

The discovery phases ......................................................................................................................... 11 Interrogating devices ......................................................................................................................... 11 Resolving addresses ........................................................................................................................... 11 Downloading connections ................................................................................................................. 12 Correlating connectivity .................................................................................................................... 12

Discovery Best Practice tips......................................................................................................................... 12 Use the standard documentation as your reference point............................................................. 13 Plan your discovery in a phased approach ..................................................................................... 13 Define what you plan to monitor ..................................................................................................... 14 Questions to ask before discovery .................................................................................................... 14

Scoping the discovery .................................................................................................................................. 15 Sparsely populated class A or class B networks ............................................................................. 15 Initially only discover a limited number of layer 2 and layer 3 devices...................................... 15 Break up large networks .................................................................................................................... 15 IPv4/IPv6............................................................................................................................................. 15

Seeds 15 Noisy routers ....................................................................................................................................... 16 Use file and ping finders to intelligently seed your network........................................................ 16 Ping finders.......................................................................................................................................... 16 File finders ........................................................................................................................................... 17

Agents ............................................................................................................................................................ 17 Full discovery agents.......................................................................................................................... 17 Partial discovery agents ..................................................................................................................... 18 Increase the number of threads on heavily used agents................................................................ 19

Filters .............................................................................................................................................................. 19 Prediscovery filters ............................................................................................................................. 19 Sensitive nodes.................................................................................................................................... 20 Post discovery ..................................................................................................................................... 20

Helpers ........................................................................................................................................................... 20 Additional discovery configuration ........................................................................................................... 21

Passwords ............................................................................................................................................ 21 DNS 21 NAT 21 Advanced............................................................................................................................................. 21

Running a discovery..................................................................................................................................... 22 Can anything go wrong ?................................................................................................................... 22 To restart with a clean database........................................................................................................ 23

Working with OQL....................................................................................................................................... 23 Verifying the topology and resolving issues............................................................................................. 24

Simple checklist if things go wrong ................................................................................................. 24 Determine agent state......................................................................................................................... 24 Run DEBUG......................................................................................................................................... 25 If discoveries take too long ................................................................................................................ 25

Page 4: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Will discovery remove devices no longer in the network ? .......................................................... 25 Partial rediscovery will not work with file finder only ................................................................. 25 Why has the discovery process started again immediately after discovery completion ?........ 26

Chapter 3 Polling.............................................................................................................................. 27 Overview........................................................................................................................................................ 27 Typical customer configurations ................................................................................................................ 28 Poller basics ................................................................................................................................................... 28

What devices should I be polling ..................................................................................................... 29 Ping (ICMP) versus SNMP impact on polling ................................................................................ 29 Thresholding ....................................................................................................................................... 30 MIB graphing ...................................................................................................................................... 30 Polled data storage ............................................................................................................................. 30 Built-in device and interface polling capabilities............................................................................ 31 Multiple polling considerations........................................................................................................ 31 Don’t fall behind in polling ............................................................................................................... 31 Time out and retry .............................................................................................................................. 32 Strategy for LAN connected (fast responders) versus WAN connected (slower responders) polling 32 Tuning poller threads......................................................................................................................... 32 Polling intervals .................................................................................................................................. 32

When to add another poller......................................................................................................................... 33 Enhanced procedure for the creation of a new poller Instance .................................................... 33

Chapter 4 Event Enrichment and Root-Cause Analysis............................................................ 35 Event management overview...................................................................................................................... 35 Event maps .................................................................................................................................................... 36 Precedence ..................................................................................................................................................... 37 Event gateway plugins................................................................................................................................. 38 Troubleshooting............................................................................................................................................ 40

Chapter 5 General Performance Considerations........................................................................ 43 Platform considerations ............................................................................................................................... 43

Processor overview............................................................................................................................. 43 Processor speed................................................................................................................................... 43 Multi-core processors ......................................................................................................................... 43 Processors summary........................................................................................................................... 43

Health check .................................................................................................................................................. 43 CPU usage............................................................................................................................................ 43 Tracing routes...................................................................................................................................... 44 Specialist scripts .................................................................................................................................. 44 Monitoring process status messages ................................................................................................ 44 Routine performance data sampling................................................................................................ 44 Component failure Information........................................................................................................ 44 Performance reports ........................................................................................................................... 45 Log file sizes ........................................................................................................................................ 45

Notices................................................................................................................................................ 46 Trademarks.................................................................................................................................................... 48

Page 5: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

5

About this publication This document details the best practices that should be employed when deploying IBM Tivoli Network Manager IP Edition 3.9. This document is designed to help an engineer quickly and efficiently install a working system and ensure that if another engineer subsequently has to work on a deployment installed by someone else, he/she will understand immediately what has been configured and how.

Intended audience This publication is intended as essential reading for all technical staff that are responsible for:

Developing IBM Tivoli Network Manager IP Edition

Installing and administering IBM Tivoli Network Manager IP Edition

Supporting IBM Tivoli Network Manager IP Edition

What this publication contains This publication contains the following sections:

Chapter 1 Introduction on page 8

This chapter provides an overview of the IBM Tivoli Network Manager IP Edition functional areas to set in context which areas will be covered by this document.

Chapter 2 Discovery on page 10

This chapter provides best practices of how to configure and trouble-shoot discovery.

Chapter 3 Polling on page 27

This chapter provides best practices of how to configure and trouble-shoot polling.

Chapter 4 Event Enrichment and Root-Cause Analysis on page 35

This chapter provides best practices of how to configure and trouble-shoot event enrichment and root-cause analysis.

Chapter 5 General Performance Considerations on page 43

This chapter provides general performance considerations best practices.

Conventions used in this publication This publication uses several conventions for special terms and actions and operating system-dependent commands and paths.

Typeface conventions This publication uses the following typeface conventions:

Bold Lowercase commands and mixed case commands that are otherwise difficult to

distinguish from surrounding text

Interface controls (check boxes, push buttons, radio buttons, spin buttons, fields, folders, icons, list boxes, items inside list boxes, multicolumn lists, containers, menu choices, menu names, tabs, property sheets), labels (such as Tip: and Operating system considerations:)

Page 6: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

6

Keywords and parameters in text Italic

Citations (examples: titles of publications, diskettes, and CDs)

Words defined in text (example: a nonswitched line is called a point-to-point line)

Emphasis of words and letters (words as words example: "Use the word that to introduce a restrictive clause."; letters as letters example: "The LUN address must start with the letter L.")

New terms in text (except in a definition list): a view is a frame in a workspace that contains data

Variables and values you must provide: ... where myname represents....

Monospace Examples and code examples

File names, programming keywords, and other elements that are difficult to distinguish from surrounding text

Message text and prompts addressed to the user

Text that the user must type

Values for arguments or command options

Operating system-dependent variables and paths This publication uses the UNIX convention for specifying environment variables and for directory notation.

When using the Windows command line, replace $variable with %variable% for environment variables, and replace each forward slash (/) with a backslash (\) in directory paths. For example, on UNIX systems, the $NCHOME environment variable specifies the directory where the Tivoli Netcool/OMNIbus core components are installed. On Windows systems, the same environment variable is %NCHOME%. The names of environment variables are not always the same in the Windows and UNIX environments. For example, %TEMP% in Windows environments is equivalent to $TMPDIR in UNIX environments.

If you are using the bash shell on a Windows system, you can use the UNIX conventions.

Related publications

IBM Tivoli Network Manager IP Edition Administration Guide

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_adm_pdf_39.pdf

Describes administration tasks for IBM Tivoli Network Manager IP Edition, such as how to administer processes, query databases and start and stop the product. This publication is for administrators who are responsible for the maintenance and availability of IBM Tivoli Network Manager IP Edition.

Page 7: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

7

IBM Tivoli Network Manager IP Edition Discovery Guide

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_dsc_pdf_39.pdf

Describes how to use IBM Tivoli Network Manager IP Edition to discover your network. This publication is for administrators who are responsible for configuring and running network discovery. IBM Tivoli Network Manager IP Edition Event Management Guide

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_poll_pdf_39.pdf

Describes how to use IBM Tivoli Network Manager IP Edition to poll network devices, to configure the enrichment of events from network devices, and to manage plug-ins to the Tivoli Netcool/OMNIbus Event Gateway, including configuration of the RCA plug-in for root-cause analysis purposes. This publication is for administrators who are responsible for configuring and running network polling, event enrichment, root-cause analysis, and Event Gateway plug-ins. IBM Tivoli Network Manager IP Edition Getting Started Guide

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_gs_pdf_39.pdf

Describes how to set up IBM Tivoli Network Manager IP Edition after you have installed the product. This guide describes how to start the product, make sure it is running correctly, and discover the network. This guide describes how to configure and monitor a first discovery, verify the results of the discovery, configure a production discovery, and how to keep the network topology up to date. Once you have an up-to-date network topology, this guide describes how to make the network topology available to network operators, and how to monitor the network. The essential tasks are covered in this short guide, with references to the more detailed, optional, or advanced tasks and reference material in the rest of the documentation set. IBM Tivoli Network Manager IP Edition Product Overview

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_ovr_pdf_39.pdf

Gives an overview of IBM Tivoli Network Manager IP Edition. It describes the product architecture, components and functionality.

Page 8: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

8

Chapter 1 Introduction

The IBM Tivoli Network Manager IP Edition 3.9 architecture consists of the following functional areas: Network discovery Network discovery involves discovering your network devices, determining how they are connected (network connectivity), and determining which components each device contains (containment). The complete set of discovered devices, connectivity, and containment is known as a network topology. You build your network topology by performing a discovery and then ensuring that you always have an up-to-date network topology by means of regular rediscoveries.

Network polling Network polling determines whether a network device is up or down, whether it has exceeded key performance parameters, or whether links between devices are faulty. If a poll fails, Network Manager generates a device alert, which operators can view in the Active Event List.

Topology storage Network topology data is stored in the Network Connectivity and Inventory Model (NCIM) database. The NCIM database is a relational database that consolidates topology data discovered by Network Manager. The NCIM database can be implemented using any one of the following relational database management systems: DB2®, IDS, MySQL, and Oracle.

Event enrichment Event enrichment is the process by which Network Manager adds topology data to events, thereby enriching the event and making it easier for the network operator to analyze. Examples of topology data that can be used to enrich events include system location and contact information.

Root-cause analysis Root cause analysis is the process of determining the root cause of one or more device alerts. Network Manager performs root cause analysis by correlating event information with topology information. The process determines cause and symptom events based on the discovered network device and topology data.

Event storage Event data is generated by Network Manager polls and also by Tivoli Netcool/OMNIbus probes installed on network devices. A probe is a protocol or vendor specific piece of software that resides on a device, detects and acquires event data from that device, and forwards the data to the ObjectServer as alerts. Event data can also be received from other event sources. Event data from all of these event sources is stored in the Tivoli Netcool/OMNIbus ObjectServer. Note: Tivoli Netcool/OMNIbus is a separate product. If you do not already have OMNIbus then you must get a copy and install it. For more information, see the Network Manager installation documentation.

Polled data storage

Page 9: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

9

At any time a network administrator can set up polling of specific SNMP and ICMP data on one or more network devices. This data is stored in the NCPOLLDATA historical polled data database. By default, Network Manager implements the NCPOLLDATA database using a database schema within the NCIM database. You can optionally integrate Network Manager with IBM® Tivoli Monitoring 6.2, with the integrated Tivoli Data Warehouse, to provide extra reporting capabilities, including better report response times, capacity, and isolation of the operational database (NCIM) from unpredictable reporting traffic.

Topology visualization Network operators can use several topology visualization GUIs to view the network and to examine network devices. Using these GUIs operators can switch between topology views to explore connectivity or associations, and to see alert details in context. Operators also have access to diagnostic tools such as SNMP MIB Browser, which obtains MIB data for devices.

Event visualization Operators can view event lists and use alert severity ratings to quickly identify high-priority device alerts. Operators can switch from event lists to topology views to see which devices are affected by specific alerts. They can also identify root-cause alerts and list the symptom alerts that contribute to the root cause.

Reporting Network Manager provides a wide range of reports, including performance reports, troubleshooting reports, asset reports, and device monitoring reports. Right click tools provide immediate access to reports from topology maps. This Best Practices Guide will cover the areas of network discovery, network polling, event enrichment and root-cause analysis.

Page 10: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

10

Chapter 2 Discovery

Overview Discovery is the first and most important task after Network Manager has been installed. The more complete and accurate the topology, the more value you will gain from route cause analysis (RCA) and the faster you can troubleshoot reported network problems. It provides a solid base to build out your network management solution for proactive and reactive monitoring and asset reporting. This section will help you set up your discovery efficiently and effectively, start and monitor discovery, and show you how to verify it and fix issues afterwards. The illustration below provides an overview of the discover process and how assets, finders, filters, agent, stitchers (and so on) fit together in the process.

For a complete description of the discovery process, see the section About discovery in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/overview/concept/nmip_ovr_disco.html

Page 11: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

11

How the discovery process works This section provides a brief in-depth look at how the discovery process works. By understanding what is involved in each phase and the structure of the data, you will be able to track the effects of your configuration choices and how discovery is interpreting the data learned from the network devices.

The discovery phases The discovery process passes through a set of distinct sequential phases. Data is manipulated within an in-memory database throughout the length of the discovery. Note: You can access the data using OQL to understand how the discovery process is handling a particular device or investigating other issues. Please refer to the OQL section for more information.

For standard definitions of the discovery phases, see the section Understanding discovery phases in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_understandingdscphases.html

Interrogating devices This phase consists of finding as many IP addresses as possible via the pinger and file finder. Once an IP address has been found an SNMP query for the sysObjectId is made and the result is placed in the Details.return table. Other IP addresses on the device are found from the SNMP ipAddress table and constructed in the translations.ipToBaseName table. Entries that match the Pre-discovery filter are then distributed to the various agent tables. Each agent has a filter in the <agent>.agnt file to determine which devices to query. The results from each agent are accumulated in the <agent>.returns tables. When a period of time passes, by default 90 seconds, when no new IP addresses are discovered, when this phase ends, what’s called a blackout period begins. During the blackout period the rest of the discovery progresses normally but any new IP addresses that are found are held until the end of discovery. The pingers may still be working, especially if ping sweeping sparse class B sized subnets. Any new IP addresses will be placed in finders.pending until the rest of the phases are complete when it will restart the discovery process for these addresses.

Resolving addresses This phase is responsible for gathering the MAC/IP information using the ArpCache agent.

Downloading connections Now that you have the MAC information, agents query the layer 2 switches for connectivity data. Once these agents have worked through their queues and populated the <agent>.returns tables, this phase ends.

Page 12: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

12

Correlating connectivity At this point, discovery has completed querying the network and begins analyzing the data to build the connectivity layers and containership relationships. This data moves through a set of tables and is finally consolidated in ncp_model’s master.entityByName. At this point, ncp_model moves the data into the ncim topology database. The connection information is built up as layers from the agent information in the following tables (among others):

IPLayer.entityByNeighbor switchTopology.entityByNeighbor CDPLayer.entityByNeighbor

and consolidated in fullTopology.entityByNeighbor. This is useful for Support engineers to understand why a connection error occurred.

Discovery Best Practice tips The following table gives a high level overview of many of the key best practice tips to be aware of during the discovery process. Many of the points are further expanded in subsequent sections of this document.

General Discovery

Use the standard Network Manager 3.9 documentation as a reference point Plan your network discovery in a phased approach, get a strong baseline of your

network first Define the parts of the network that you want to discover

Scopes

If your subnet is sparsely populated, including individual routers is likely to result in a faster discovery

Initially only discover a limited number of layer 2 and 3 devices and don’t run in DEBUG mode

Break up large networks into smaller manageable groups

Seeds

Use noisy routers as seeds, as these make good initial seeds Use file and ping finders to intelligently seed your network To restrict discovery, seed with a list of devices using the File finder or the Ping finder,

and disable feedback in the Advanced tab

Agents Focus on an initial reduced set of agents that meet your requirements Add additional agents for specific requirements (e.g. TraceRoute)

Filters Use pre-discovery filters to filter out end nodes, printers, and similar devices Use pre-discovery filters to filter out sensitive nodes that you want to monitoring Use post-discovery filters to prevent instantiation of devices

Helpers

Only modify the standard helpers if you are an experienced user To speed up the discovery process, you could reduce the helper timeouts and number

of retries If you have a very reliable network in which devices respond quickly, you can specify

a small default timeout To reduce the amount of network traffic caused by a discovery, you could increase the

timeout and disable broadcast and multicast pinging

Page 13: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

13

Resolving issues

Use OQL to determine where in the process discovery has halted Determine what state agents are in Run DEBUG on suspected failing components

Additional pointers

Generate a list of discovered IP addresses to make the discovery more efficient on a regular basis

Increase number of threads on heavily used agents

Use the standard documentation as your reference point The standard Network Manager 3.9 documentation is the reference point for the correct usage of the product and should always be visited first. http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/index.jsp?topic=/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/common/welcome.htm This includes complete material on:

Getting started Installing Administering Networks Reference

This best practices document is intended to highlight key points from the documentation that will greatly assist your Network Manager usage as well as augment with additional best practice material that has been gathered from a large base of customer engagements.

Plan your discovery in a phased approach Network Manager 3.9 is a powerful product which can be configured to discover a wide range of network solutions and topologies. A key component of network discovery is correct planning to allow both complete discovery of the required devices within your network but to also complete these in a timely manner. The further sections of this document highlight how to focus on specific parts of the network, to speed up network discovery and to configure items such as finders, agents, filters. Discovery is an iterative process and is constantly updating with regular changes in your network infrastructure. Starting small and focusing on core parts of the network, collecting key information provides a baseline for more extensive discovery of the network. Performing a network discovery in this way allows you to easily isolate and resolve problems areas within the network with ease.

Define what you plan to monitor From a high level, have a clear indication of what elements you want to monitor in your network. Network management solutions can provide a huge amount of data and it important to focus on collecting and filtering key information that is required for monitoring.

Questions to ask before discovery Network(s) to be discovered

List the subnets/netmasks or IP address that should be discovered

Use these to help define discovery scope and seeds

Page 14: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

14

Excluded IPs/subnets

List the IP addresses that are in scope that should be excluded e.g. interfaces being monitored for DoS

Use an exclude scope or a pre-discovery filter to exclude these IP addresses

Network technologies

Briefly describe the networking technologies in place e.g. MPLS, ATM/FR, metro Ethernet

Specific technologies may require a specific discovery approach

Number of main nodes and interfaces

An approximate number Used for server dimensioning

List of network devices

List the device types e.g. Cisco 2601, Juniper M5 to be monitored

Some devices may have specific discovery requirements

NAT description If it is a NAT environment, is it static or dynamic ? Which vendor ?

Security Are there any ACLs / firewalls / security ?

Community strings

List the community strings of all devices in the discovery scope

To allow authentication of all devices

Telnet passwords RO required for some Cisco Catalysts. Sometimes required in some instances for MPLS discovery (Cisco, Juniper, Huawei)

To ensure proper access to devices

Miscellaneous Are any devices in an out-of-band management network ? How do you want to name the devices /etc/hosts, DNS or sysName – also ifName / ifDescr / ifIndex for the interface naming ? Do you want the loopback back address to be the main node IP address ? Are there any firewalls that can never be discovered ?

Scoping the discovery First you have to fence in the discovery and set the boundaries. Here you specify the IP addresses and subnets that you wish to manage. You need to consider not just the management addresses of the network devices, but also the IP subnets of the router interfaces you wish to poll for availability. IP interfaces on routers that are out of scope will be automatically tagged as unmanaged. For standard definitions of the scope settings, see the section Understanding scope settings

Page 15: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

15

in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_settingdscboundaries.html

Sparsely populated class A or class B networks Firstly, do not configure the ping finder to ping all IP valid address, this will either take a long time or the discovery will enter rediscovery mode multiple times. In this type of situation it is best to sit down with the network person and find out the addresses of the core routers, or address ranges that contain routers. If this is not possible take some time to ‘traceroute’ to known subnets and populate the ping finder with these addresses.

Initially only discover a limited number of layer 2 and layer 3 devices When presented with a new system try running a limited network discovery first. This will give an idea of the scope of the network before interrogating each device more extensively. It will also give a chance to check all the object ids are handled correctly and that there are working passwords for all devices you want to interrogate using SNMP.

Break up large networks Break the network out into smaller network portions in order to get quick visibility and verify the correct functioning of the discovery processes.

IPv4/IPv6 Network Manager does not support the IPv4-mapped IPv6 format and expects all IPv6 addresses to be in standard colon-separated IPv6 format. For example, Network Manager does not support an Ipv4-mapped IPv6 address such as: ffff: 192.0.2.128. Instead enter this address as: ffff:c000:280 (standard colon-separated IPv6 format).

Seeds The pingFinder should be enabled if you want the discovery engine to find devices as a result of querying other devices - referred to as “feedback”. The pingFinder will ping sweep subnets, while the fileFinder provides a convenient way to seed the discovery with one or more lists of IP addresses. There are options that let you limit discovery strictly to a list of IP addresses (either in a file or pingFinder entries) if required. Keep in mind that seeds only work within the confines of the scope. Ping sweeping large class B subnets is less effort to configure but also less efficient than fileFinder lists. After successfully discovering your managed network, you may want to consider generating a list of discovered IP addresses to make discovery more efficient on a regular basis in production. New devices will still need to be discovered since feedback is enabled by default.

Noisy routers Noisy routers are good initial seeds

Page 16: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

16

Use file and ping finders to intelligently seed your network Here is an example of good seeding versus bad seeding. The objective is to ping sweep a sparsely populated 192.168.0.0 / 16 network (5000 addresses).

Case 1 – simple seed (same as scope) Objective: sweep 192.168.0.0 / 16 Threads = 1 Number of devices = 5000 Addresses to ping = 256 * 256 = 65536 Pings per second = 10 Number of retries = 1 Time taken (seconds) = (Addresses to ping / Number of threads / Pings per second) + ((Addresses to ping – Number of Devices) * Number of retries / Number of threads / Pings per second) = (65536 / 1 / 10) + (65536 – 5000) * 1 / 1 / 10) = 12607 (~ 3.5 hours)

Case 1 – complex seed (manipulated scope) Objective: sweep 192.168.0.0 / 16 - split scope into smaller seed networks (e.g. / 19) Threads = 8 Number of devices = 5000 Addresses to ping = 256 * 256 = 65536 Pings per second = 10 Number of retries = 1 Time taken (seconds) = (Addresses to ping / Number of threads / Pings per second) + ((Addresses to ping – Number of Devices) * Number of retries / Number of threads / Pings per second) = (65536 / 8 / 10) + (65536 – 5000) * 1 / 8 / 10) = 1576 (< 0.5 hours)

Ping finders Any successful ping response will result in an object being created and further SNMP queries which may result in other IP addresses being found from the routing table or ipNetToMedia table. This is known as “feedback” and can be controlled from the advanced tab. For example-based tutorial steps on how to configure the Ping finder, see the section Enabling the Ping finder and the feedback mechanism in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_enablingpingfinder.html You can specify the seeds as “pingFinders”, which define IP addresses to ping or subnet addresses to ping sweep. Noisy routers make good initial seeds. Note: Be conscious of the size of the subnet you specify, especially with IPv6 subnets. Class B networks can take 30-40 minutes to ping, and class A, a day or more. If these subnets are sparse this can lead to long periods of silence and the discovery will think it is done and continue with the final stages to completion. However the pinger continues and as more ping responses come in, they will cause the discovery to begin a new cycle. It is designed this way to ensure the fullest possible discovery, but does make it tricky to know when the discovery is really complete. IPv6 subnets should have a mask of at least 96 bits or greater.

File finders You can also create “fileFinder” entries which specify files containing lists of IP addresses. Thanks to the formatting flexibility, you can use existing files you might maintain outside of Network Manager and thus provide a basis for provisioning control. You must specify the

Page 17: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

17

delimiter and the column number for the IP address. Network Manager can extract just the IP address or both the IP address and the corresponding name which will be used as the display name for that device. Note that the delimiter is a regular expression, so the default ‘[ ]+’ indicates one or more spaces. For example-based tutorial steps on how to configure the File finder for efficient production seed settings, see the section Configuring production discovery settings in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_settingupdscconfigurationforproduction.html

Note: Be careful formatting your Finder files. The IP address must be cleanly defined with no leading characters or spaces otherwise it will not be used. Check ncp_df_file.<DOMAIN>.log for syntax errors. Since fileFinders are more efficient for discovery, many users tend to use this in production. Continue to keep the PingFinder box checked, even if you have no pingFinder entries, so that new devices added to the network will be discovered

Agents

Full discovery agents The default set of agents are typically good to begin assessment of your specific needs. Click on each agent to see an explanation that will give you an idea of whether you will benefit from it. For example-based tutorial steps on how to configure agents for a discovery, see the section Ensuring all network technologies are covered in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_ensuringnwtechnologiesarecovered.html Focusing on a core set of agents will allow you to collect the essential data for your network and speed up the overall discovery time. When running a layer 3 discovery, the Details and AssocAddress agents are run along with a combination of the following IP layer agents:

IpRoutingTable IpBackupRoutes IPForwardingTable HSRP VRRP TraceRoute agent can be used if there is a firewall on the network, because SNMP

calls cannot always be made through firewalls. If you use the TraceRoute agent, you must specify, as a discovery seed, the subnet node for the subnet on the other side of the firewall.

IPv4/6 InetRouting. If you have IPv6 in your network, consider running this agent to discover the connectivity, particularly the IPv6 connectivity.

Page 18: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

18

Some routers support layer 2 technologies. For example, when an ATM card is located in a router chassis, layer 3 discovery agents, such as the IpRoutingTable agent, only discover interfaces with an IP address. Therefore, to fully discover all the interfaces on routers that support layer 2 technologies, you must run the appropriate agents. The ArpCache agent retrieves the physical address of a device, so is only required (in conjunction with the Switch agents) when performing layer 2 discoveries. Frame Relay agents should be run in conjunction with the IP layer agents if you need to add DLCI information to the interfaces of Frame Relay devices. Switch agents must be run for a layer 2 discovery. The Entity agent provides physical containment information from the Entity MIB for asset purposes, physical containment information, and root cause. It is resource intensive and will extend the discovery time and can create additional entity objects in the database. Some of the asset reports require information provided by the Entity and OSinfo agents including the Structure Browser which shows containment information such as modules, cards, and slots. These two agents are disabled by default. By default the IpRouting agent is enabled. It learns about other IP addresses and subnets to feed back into the discovery. Alternatively you can enable the IpBackupRoutes agent which learns about IP addresses from the ipNetToMedia (arp) table instead.

Partial discovery agents The agents enabled on this tab will only be used during partial discoveries. Partial discoveries can be triggered from the Discovery Status page or from the context menu of a device. One or more subnets or IP addresses can be used. See the Advanced tab if you wish to re-stitch the entire topology layered connections. Otherwise it will just refresh existing device information or discover new objects but will not attempt to create the connections.

Note: If you plan on using the Partial Discovery feature, you will need to enable the checkbox “Enable Caching of Discovery Tables” on the Advanced tab. This stores the OQL caches to disk and enables you to continue doing partial discoveries after a restart of ncp_disco. The Ping finder on the Seeds tab also needs to be enabled (even if there are no ping finder entries). Note: You can trigger a partial discovery by creating entries in Disco OQL table finders.rediscovery. This can be done using a script and scheduled with cron to update a volatile region of your network more frequently.

Increase the number of threads on heavily used agents Increasing the number of threads can be used for heavily used agents, such as:

Disco – DiscoAgent.cfg SNMP Helper – DiscoSnmpHelperSchema.cfg Ping Finder – DiscoPingFinderSchema.cfg

Page 19: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

19

Filters

Prediscovery filters Prediscovery filters prevent the discovery from retrieving detailed data or connectivity data from the device and prevent discovered devices from being polled for connectivity information. Only devices matching the prediscovery filter are fully discovered. If no prediscovery filter is defined, then all devices within the scope are discovered. For example-based tutorial steps on how to tune a scope zone using a prediscovery filter, see the section Fine tuning a subnet scope zone using prediscovery filters in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_creatingcomplexscopes.html Prediscovery filters provide a mechanism to base discovery on complex IP ranges that cannot be easily defined in the Scope tab. It can be used to filter out devices based on their sysObjectId value. Default filters exist to filter out end nodes, printers, and similar devices. You can create quite complex multiple filters, which makes this feature very powerful, but try to ensure that filters are designed so that they can be easily maintained if you need to add new scopes. The filter acts on the fields of the Details.returns OQL table in the discovery (Disco) service, so you can use fields other than IP addresses, such as m_ObjectId (equivalent to sysObjectId). A device must pass all filters to be discovered. Search InfoCenter for ‘Using Regular Expressions’ for detailed usage. Note: A simple way to test your regular expression syntax is to use it in an OQL command such as in the Disco service: select * from Details.returns where m_UniqueAddress LIKE '10\.30\..*\.[1-5]$' This will result in a list of all the IP addresses discovered in that range. If you do an initial discovery without excluding these addresses, you can use this to test your syntax.

Sensitive nodes You might want to apply a filter to sensitive devices that you do not want to poll. A device might be considered sensitive because there is a security risk involved in polling the device, or because polling might cause the device to overload.

Post discovery You might want to apply this filter to devices that you do not want to poll, such as workstations and printers. A post-discovery filter restricts device instantiation. If a post-discovery filter is defined, only devices that pass the criteria are instantiated, that is, sent to

Page 20: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

20

ncp_model. If no post-discovery filter is defined, then all discovered devices are sent to ncp_model.

Helpers The helpers are specialized applications that retrieve information from the network on demand. The default helper configuration is sufficient for most networks. However, you might decide to alter the configuration for several reasons. Configuring the Helper System can speed up network discovery, but is recommended for experienced users. Although the discovery agents retrieve connectivity information, they do not have any direct interaction with the network. Instead, they retrieve connectivity information through the Helper System, which consists of a Helper Server and various helpers. Reasons to configure the helpers include:

To speed up the discovery process, you could reduce the helper timeouts and number of retries.

If you have a very reliable network in which devices respond quickly, you can specify a small default timeout.

You might want to change the default timeouts for the SNMP and Telnet helpers if you have many devices that either do not respond to SNMP and Telnet or that are set up not to respond to Telnet or SNMP access. A large default timeout would therefore mean that the helpers wait for a long time for responses they never receive.

To reduce the amount of network traffic caused by a discovery, you could increase the timeout and disable broadcast and multicast pinging

Additional discovery configuration Configuration can be performed using the Discovery Configuration Wizard, the Discovery Configuration GUI, and the command line. This chapter focuses on discovery configuration using the Discovery Configuration GUI.

Passwords Use the Passwords tab in the Discovery Configuration GUI to specify the SNMP community strings, including SNMPv3 credentials, used in your network. For example-based tutorial steps on how to configure SNMP community strings, see the section Configuring device and subnet access using SNMP community strings in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_confdeviceaccessusingsnmpcommstrings.html Typically you will not need telnet access to start with. Consider the telnet based agents once you have a good discovery to see if they will improve your goals to provide deeper modelling such as MPLS VRF/VPNs, NAT, or fill any gaps in information from the SNMP based agents for BGP, layer 2, OSPF, e.g. Cisco SRP.

Page 21: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

21

DNS Configure the DNS service to use either the system DNS setup or to specify a particular DNS server.

NAT Once you have successfully completed your initial discoveries, refer to the documentation under Configuring NAT translation if you have NAT gateways. Refer to the section Configuring NAT discoveries in the IBM Tivoli Network Manager IP Edition Discovery Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/disco/task/nmip_dsc_confnatdiscoveries.html Note: If you do not define any NAT gateways on this page, make sure the NAT checkbox is DISABLED. Otherwise discovery can hang and eventually complete with nothing discovered.

Advanced Typically you do not need to change anything here for the initial discoveries. These options provide control over discovery that can help work around known network device eccentricities. For a deeper explanation of these items, click on the context help icon on this page. However a few items are worth drawing your attention to: Enable Feedback Control - select this to discover additional devices learned from other devices. Disabling this will assist in minimizing the number of IP addresses. Enable Ping Verification - select this to force discovery to only create objects for devices that respond to a ping. “Detect best setting” (default) will set it depending on the state of Feedback control - which will enable ping verification only if feedback control is enabled). Enable Caching of Discovery Tables - enable this to store a full set of discovery cache files. Partial rediscovery relies on the OQL data from the last discovery being available. When ncp_disco is stopped, the OQL in-memory data is lost, preventing partial rediscoveries. To ensure partial discoveries are always available, enable this option. It is also useful if you need to report a discovery problem to IBM Support. Be aware that it does affect the discovery performance, especially for very large networks. Enable ifName/ifDescr Interface Naming - this option provides a more useful interface display name. Enable sysName Naming - enable this if you have a certain discipline with sysName on the network devices and want to use it as the device display name. Enable VLAN Modelling - if you do not need VLAN modelling (useful for RCA), then disable this to speed up discovery. Note:

Page 22: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

22

If you need to edit the discovery configuration files themselves, make sure the Discovery Configuration is not open in the GUI, otherwise when changes are saved in the GUI they could overwrite your file edits.

Running a discovery Now that you have completed the configuration, move to the Discovery Status page to actually start discovery. For example-based tutorial steps on how to run and monitor a discovery, see the section Launch the discovery and monitor discovery progress in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_startingandmonitoringdiscovery.html

Can anything go wrong ? Discovery is a complex process handling input from a huge variety of devices from the network, not all of it consistent and occasionally malformed. Sometimes you will see an agent get stuck. The discovery appears to be hung, the agent queue counts are not moving. Before calling Support please ensure you refer to the getting started guide for the section Troubleshooting discovery issues.

To restart with a clean database Whilst experimenting with your initial discoveries it is sometimes useful to start again with a clean deck. Particularly if you are changing the scopes as old devices, now out of scope, can cause confusion in the topology. 1. Stop all ncp processes

itnm_stop ncp 2. Remove all the cache files (replace <domain> with the domain name)

rm -f $NCHOME/var/precision/*<domain> 3. Restart ncp processes

itnm_start ncp Alternatively, use OQL to delete all the entities in the topology database, by executing this OQL command on the ncp_model service: delete from master.entityByName;

Working with OQL The discovery process follows a fairly linear path. Finders to disco, disco to details, details to disco, disco to AssocAddress, AssocAddress to disco, disco to agents, agents to disco, disco runs processing stitchers, disco sends topology to ncp_model. By using OQL you can find out where along the path the discovery has stopped. A common OQL request to find out what devices a particular agent is still working on is.

Page 23: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

23

Object Query Language (OQL) is a SQL-like language used to manipulate memory resident databases in both Network Manager and Tivoli Netcool/OMNIbus. It is covered fully in the documentation under Reference->Reference Languages. OQL is used in the context of a service which identifies the process containing the in-memory database. This allows you to see the real time data from services such as:

amos Event processing within the root cause analysis engine

ctrl Information to control the automatic start and shutdown of ncp processes

disco Discovery data throughout the cycle

ncim Access to the NCIM topology database in any of the supported relational databases. In this case, you would use standard SQL rather than OQL syntax to query the tables

ncoGate Event processing

ncp_model Topology data output from the discovery process prior to transfer to the relational database

ncp_poller Query the set of devices being monitored by each policy. It permits filtering against a specific policy (e.g. show all devices currently monitored by the Interface Ping policy) and/or against a single entity (e.g. show all policies that are monitoring a particular device). Note that this is not a query of the database, and what you think you should be polling, but a query of what is currently being processed by the poller.

SnmpHelper Information used by the ncp_poller and SNMP stack To see a full list of available services, type, ncp_oql -options You can execute OQL commands interactively from either the command line or from the GUI, select Administration->Network->Management Database Access. The warning about advanced users is to remind you that this is live data that can corrupt processes if you change it. Typically you will only be viewing data, not changing it.

Verifying the topology and resolving issues After the discovery has completed successfully, you need to check how well it is modelling your network. For example-based tutorial steps on how to verify the topology, see the section Verifying the topology in the IBM Tivoli Network Manager IP Edition Getting Started Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/start/task/nmip_dsc_verifyingthetopology.html

Simple checklist if things go wrong There are a number of simple explanations if the discovery process goes wrong and the following checklist should be carried out before a detailed analysis is carried out:

Wrong read only string for seed router Access list filtering traffic through a router

Page 24: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

24

Seed router on subnet other than the server SNMP timeout set too high Specific devices may crash or agent hang during discovery

Determine agent state Determining what state agents are in is a very useful tool in determining if there are issues with the agent process. The disco.agents shows you the state of all the agents. A value of 0 means the agent is not being run so a useful query is: select * from disco.agents where ( m_State <> 0 ); This will tell you all the agents that are still running. The possible state values are: 0 – Undefined (not being used) 1 – Not running 2 – Starting up 3 – Running 4 – Finished 5 – Died

Run DEBUG Run discoveries without debug generally but if a problem develops then run debug on the suspected failing components. Remember that by altering the DiscoSchema.cfg and DiscoHelperServerSchema.cfg you can get debug for every helper, finder and agent.

If discoveries take too long If the discovery seems to be taking a long time first find out what agents are running. Then find out what those agents are currently working on. If you find an agent that appears to be blocked the next step is to find out why. It is most likely to be waiting for some data from the helper server so the next step is to turn on debugging on the helper server (debug level 2 should be enough at this stage) and find out what data it is waiting for. Once it is known what data the process is waiting for, you can progress and put debug on the helper that is retrieving that data and hopefully find out what's holding it up. At this point hopefully you will know what the device is that’s causing the problem and roughly where the problem resides, which if you can't fix yourself will make it a lot easier for an engineer to solve should an escalation be required to Engineering / Support - particularly after giving them the SNMP or telnet dump that should be taken from the offending device. Always use the snip walk Perl script that is part of the precision distribution for walks if possible. They are easier to manipulate without playing with too many scripts etc and the tool delivers a mimic file that can sometimes use to mimic the device.

Will discovery remove devices no longer in the network ? Typically you want to balance removing devices from the topology that are no longer in the network with devices that are simply down at the moment. You do not want the latter to disappear and not be monitored once they come back on-line.

Page 25: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

25

Network Manager maintains a count, called linger time. By default each device starts out at 3 and on each discovery is reset to 3 if they respond to ICMP or SNMP. If not, the count is decremented. On reaching -1 it is removed from the database. You can change the default by editing the $NCHOME/etc/precision/ModelSchema.cfg file and changing the master.entityByName Linger time field definition. Use the RemoveNode.pl Perl script to remove specified devices from the network topology. It does this by setting the device to an unmanaged state and marking the device to be removed during the next full discovery.

Partial rediscovery will not work with file finder only The partial rediscovery sends the IP addresses to be rediscovered to the ping finder. If the ping finder is not running then the partial rediscovery functionality will not be available out-of-the-box. However, this can work with the other finders by editing an existing stitcher - FnderRediscoveryToPingFinder.cfg. For example, it can be edited so that single IP addresses are inserted into the finders.returns table.

Why has the discovery process started again immediately after discovery completion ?

This usually occurs when the ping finder is used during the discovery. During phase 1 you expect to find all the devices, however this may not be the case. Network Manager decides to exit phase 1 when the time after the last found IP address exceeds a user defined time limit, m_NothngFndPeriod, defined in DiscoSchema.XXXX.cfg. By default this is 90 seconds. Once out of phase 1, any newly found devices are placed in the finders.pending table. Once the initial discovery has completed, the discovery will examine the finders.pending table and then discover these devices, thereby restarting the discovery. To stop this immediate discovery, either increase the m_NothngFndPeriod period OR add more seed IP addresses or subnets to the ping finder.

Page 26: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

26

Chapter 3 Polling

Overview One of most important functions of Network Manager is polling the managed network. While significant functionality exists to provide event display and to automate responses, successfully polling the target devices at an interval consistent with the customer’s management objectives is critical. The Network Manager product overview documentation provides indicative configurations examples for typical network deployments. This section explains some of the key areas such as typical numbers of network domains and polling engines. These examples are only indicative and a more detailed sizing and dimensioning activity should be carried out to ensure that you meet your required network performance. Note: It is not possible in this document to share actual Network Manager performance and capacity results for the many platforms examined in the test lab or in customer engagements. This material is confidential and/or might be taken as a performance guarantee. Production settings can be much different than a controlled test lab. The material presented in this document is based on the collective experiences of our laboratory testing and with customers. A successful deployment of Network Manager's polling function will result from an incremental process of deployment an evaluation in the customer’s unique environment. Read this material, start slowly, gain experience, grow the workloads, and monitor. Network Manager provides the user with the ability to activate workloads that can quickly overwhelm the single default poller instance. Most polling concerns result from unintentionally large workloads and devices that do not respond. Key areas that influence individual poller capacity and scalability are:

Frequency of polling Single versus multiple core processors Data storage options (including MIB graphing, storage to Network Manager's

database, storage to Tivoli Data Warehouse) Database considerations for data storage (actual database selected among those

supported, OS kernel tuning, database memory and disk tuning options, and database server capacity)

Real event versus post processing Timeouts Retries Tuning poller threads Network response time for the target devices – LAN or WAN connected Average rate of network target’s failure to respond; poor network connections Type of polling (specific policies); the number of polling packets sent for individual

policies (interface polls = a lot per device versus a chassis ping) Complexity of polling (single policy mapped to a poller instance or multiple policies) Integration with other active Tivoli products Automation in place Unrealistic expectations

Page 27: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

27

As a part of optimizing specific polling variables, you will need to monitor the health of your poller without impacting overall performance. Setup of a light DEBUG process with a low overhead to identify performance issues will pay an important part of this process.

Typical customer configurations Deployment scenarios could range from the a few devices for a demo system to monitoring a Telecoms network managing many thousand devices. For a comprehensive description of deployment scenarios, see the section Deployment of Network Manager in the IBM Tivoli Network Manager IP Edition Product Overview: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/nmip_ovr_pdf_39.pdf For more information on typical network deployments, see the section Customer networks in the IBM Tivoli Network Manager IP Edition Product Overview, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/overview/reference/nmip_ovr_customernwscompared.html The tables within this provide a comprehensive overview not only for typical polling scenarios but also configurations and options that you are most likely to encounter.

Poller basics To poll the network, Network Manager periodically sends queries to the devices on the network. These queries determine the behavior of the devices, for example operational status, or the data in the Management Information Base (MIB) variables of the devices. Network polling is controlled by poll policies. Poll policies consist of the following:

Poll definitions, which define the data to retrieve. Poll scope, consisting of the devices to poll. The scope can also be modified at a poll

definition level to filter based on device class and interface. Polling interval and other poll properties.

Note: The poll scope is often the cause of a device not being polled (particularly in the class based part of the poll scope). When defining poll policies, give extra attention to filtering devices correctly.

For more information on poll policies, see the section About polling the network in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at:

http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/poll/concept/nmip_mon_pollnwovr.html

What devices should I be polling Defining what parts of the network to poll in the network is critical in managing data load and performance. The poll policy scope defines the devices or device interfaces to be polled. A poll policy scope can be described as a series of filters. If at any stage a filter is not defined, then all devices pass through. The output of this set of filters can be either a set of devices or, if the interface filter is defined, a set of devices interfaces.

Page 28: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

28

Assigning correct poll policies is critical in effectively monitoring essential devices in your network. There is no ‘one solution fits all’ and each network needs to be assessed based on your needs.

What devices do you want to constantly monitor ? What type of information do you want to report ? What polling frequency do you require ? What type of polling do you require ? What thresholds do you want to use ? Do you want to store data for historical reporting ?

For more information on poll policy scope, see the section Poll policy scope in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/poll/concept/nmip_poll_policyscope.html

Ping (ICMP) versus SNMP impact on polling Ping polling determines the availability of a network device or interface by using an ICMP echo request. These are generally one-for-one packets; the ICMP Chassis ping will result in a single packet and seek the return of a single packet. Ping polling can be performed on either a chassis or an interface of a device. In the case of a chassis, the ICMP packets are sent to the IP address of a main node device. The main node IP address is also associated with an interface. In the case of interfaces, the ICMP packets are sent to the IP address of each interface. Consequently, if you enable ping polling for chassis and interfaces, the traffic on main-node IP addresses doubles. On many network devices, pings are typically run as a low priority and often time out. For this reason, Ping polling may be configured to send one or more pings to a device before generating an event indicating loss of connectivity. The number of retries is configurable by the user. Note: By default, only the chassis ping poll is enabled on all devices within the discovered network topology, with the exception of end-node devices, such as desktops and printers. SNMP polling involves retrieving Management Information Base (MIB) variables from devices in order to determine faulty behavior or connection problems. Faulty devices or faulty connections are then diagnosed by applying predefined formulas to the extracted MIB variables. Unlike ICMP polling policies, SNMP polling policies do not generally yield a one-for-one packet workload. For a network device, multiple SNMP queries result for a given policy. For example, a capacity of 50 SNMP packets per second does not mean you are polling 50 devices, but more likely just a few. Capacity planners need to consider not only the rate of packets but the number of devices that can be supported. Note: By default, Network Manager provides a single poller instance and by default only Chassis Ping (ICMP) active. No SNMP polling is active is by default.

Page 29: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

29

Thresholding Use basic threshold polling to apply simple formulas to the MIB variables, or for filtering the scope at device and interface level. To filter at interface level, the poll definition must be set up for interface filtering. Use generic threshold polling for complex formulas, or for filtering the scope at device level only.

MIB graphing Graphing a MIB variable is useful for fault analysis and resolution of network problems. By graphing a MIB, operators and administrators can see a real-time graph of specific MIB variables for a network device. The MIB variable is polled at a user-defined interval and displayed in a graph over time. As a first step, test the performance of MIB graphing using the default poller. If capacity challenges suggest a need, create another poller instance and map it to one of the primary active policies. This provides a more significant redistribution of the polling workload across the pollers.

Polled data storage At any time a network administrator can set up polling of specific SNMP and ICMP data on one or more network devices. This data is stored in the NCPOLLDATA historical polled data database. By default, Network Manager implements the NCPOLLDATA database using a database schema within the NCIM database. You can optionally integrate Network Manager with IBM Tivoli Monitoring and with the integrated Tivoli Data Warehouse, to provide extra reporting capabilities, including better report response times, capacity, and isolation of the operational database (NCIM) from unpredictable reporting traffic. The Tivoli Data Warehouse option also provides data summarization capability. The capacity for polling that inserts performance data into an Informix or Tivoli Data Warehouse database is expected to be significant.

Built-in device and interface polling capabilities Network Manager provides a set of ready-to-use device and interface polls, including ping polls and MIB variable threshold polls. The MIB variable threshold polls generate network events if thresholds are violated on specified MIB variables. You can customize network polling to so that events are received when thresholds are violated on any MIB variable on your network devices. The product default installation provides one default poller (ncp_poller). For small to medium networks using the default polling intervals, one poller is often adequate. For larger networks, those with special polling requirements, intervals faster than the product default of 120 seconds, or target populations with significantly different response times, multiple pollers may be needed.

Page 30: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

30

For a list of built-in poll policies, see the section Default poll policies in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/poll/reference/nmip_mon_defaultpolldefs.html For a list of built-in poll definitions, see the section Default poll definitions in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/poll/reference/nmip_mon_defpolldef.html For more information on multiple pollers, see the section Administering multiple pollers in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/admin/task/nmip_adm_admindistpoll.html

Multiple polling considerations For cases where the polling objectives (the number of targets, the polling intervals, the number of active polling policies, and possible storage of performance data) may exceed the capacity of the single default poller, additional instances of the poller process may be created and assigned to poller policies. Popular recommendations for multiple pollers:

poller dedicated to a chassis ping poller dedicated to a link state one or more pollers dedicated to interface polling poller dedicated to SNMP polling for MIB graph presentation by the end user

interface

Don’t fall behind in polling A key concept to be aware of is falling behind in polling. With large numbers of device, variable polling periods, timeouts and retries, you can quickly get into a situation where all the devices in your network have not all been polled correctly. It is critical that you ensure that these variable parameters are defined correctly to ensure you are monitoring all the devices that you intended to monitor.

Time out and retry When the target fails to respond the response fails, this lead to the generation of an alert or event which is processed by Tivoli Netcool/OMNIbus. Failure to respond to the ping/poll can result in additional ping attempts as defined by the setting for retry and time out. For most polls, a good starting value for timeout is 250 ms with retries of 2. For good LAN connections, the response time might be expected to be in the 4-50 ms range, and thus this timeout value would be a good choice. For networks with slow link connections, and known slow response times, it is best to select the timeout value for your network such that it is possible to respond within the timeout value (that is it is not set below the average response time and thus have frequent retry attempts).

Page 31: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

31

Strategy for LAN connected (fast responders) versus WAN connected (slower responders) polling

Network response times can play an important role in polling capacity. LAN connection will typically be fast responders to a ping/poll. WAN connections may have a considerably slower response due to the specifics of its network architecture.

Tuning poller threads Note: Poller thread tuning should only be carried out by experienced users; otherwise defaults should not be modified. The polling process (ncp_poller) is a multi-thread. The number of threads can be configured by editing the file $NCHOME/etc/precision/NcPollerSchema.cfg While the threads may be increased, it is generally not recommended the thread count be increased in the attempt to gain additional single poller capacity. Tuning the threads this way is helpful in a slow response time setting, where you are waiting a long time (in networking terms) for a response. A given thread will wait for that slow responding device (perhaps connected over a WAN network) to finally respond. More threads means you can talk to more targets while waiting for slow responders. For typical LAN speed connections, slow response is not a factor.

Polling intervals

Status checking Data collection

Chassis 2 minutes < 2 minutes

SNMP 5 minutes 10-15 minutes

When to add another poller Considerations of when to add another poller: Your combined polling rate/second is about 75% of the ideal maximum scalability record for your platform and processor speed. Note that the rates achieved in the test lab are idea, with 100% responders in 2-4ms and no network contention. Actual results would be something less than these ideal results. As you seek to use a processor core to its fullest for each poll before a split of the workload, typically you plan for one processor core per poller instance in a post discovery setting. When running under debug 1 with INFO, a poller shows messages about getting behind for one or more IP addresses and a network trace does not show a response time out, no response retry (that is the target did not respond). An effective technique is to split the polling workload assignments by subnet.

Page 32: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

32

Enhanced procedure for the creation of a new poller Instance On the system hosting the Network Manger component, in a command line environment: 1. Register the new poller

ncp_poller -domain domain_name -register -name poller_name 2. Make a back up copy of your CtrlServicesDOMAIN.cfg file before changing it

The default install location for this file is /opt/IBM/tivoli/netcool/etc/precision

3. Edit the CtrlServicesDOMAIN.cfg file to add the definition of the new poller 4. Locate the entry for the ncp_poller process and copy this entry and add the -name

parameter to match the name used in the registration command above. Note that the -name option is not present for the default poller, and this option is best added at the end of the options line. An example is given below.

5. Save the file 6. Restart Network Manger so that the ncp_ctrl process can restart all running

processes and the new instance of the poller. Use itnm_stop and then itnm_start, or itnm_stop ncp and then itnm_start ncp for just Network Manger

7. Assign the new poller instance to a poll using the Administrator GUI session:

a) Go to Network Polling b) Select a name from the list in Configure Poll Policies c) Edit it and use the list of pollers available for “Assign to poller instance;” to pick

the new poller d) Save it

An example of a new ncp_poller instance in the CtrlServicesDOMAIN.cfg file is shown here: insert into services.inTray ( serviceName, servicePath, domainName, argList, dependsOn, retryCount ) values ( “ncp_poller”, “$PRECISION_HOME/platform/$PLATFORM/bin”, “$PRECISION_DOMAIN”, [ “-domain”, “$PRECISION_DOMAIN”, “-latency”, “100000”, “-debug”, “0”, “-messagelevel", "warn", "-name", "dave_poller1" ], [ “nco_p_ncpmonitor”, “ncp_ncogate” ], 5 )

Page 33: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

33

For more information on the ncp_poller command, see the section ncp_poller command-line options in the IBM Tivoli Network Manager IP Edition Administration Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/ref/reference/nmip_ref_startmonitor.html

Page 34: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

34

Chapter 4 Event Enrichment and Root-Cause Analysis

Event management overview Network Manager holds topology data in the NCIM topology database. This data is used to perform event enrichment and correlation of events in the Tivoli Netcool/OMNIbus ObjectServer in a highly configurable manner. Additional NCIM topology database tables can be defined and populated to extend the default topology model, and additional Tivoli Netcool/OMNIbus event fields and tables can be created to hold custom data for events. The events are enriched with topology data using the Network Manager Event Gateway. This has a number of out-of-the-box plugins, shown below, which can be enabled and disabled independently. Further configurable plugins can be defined using the Network Manager stitcher language.

Network Manager performs a number of tasks:

Matches an event to an entity Enriches the event in Tivoli Netcool/OMNIbus with information about that entity Passes the event on to plugins, including RCA

The principal task of the Network Manager Event Gateway is to match an event in the Tivoli Netcool/OMNIbus alerts.status table to an entity in the NCIM topology database. Once the match is made, event enrichment can be performed:

Standard out-of-the-box event enrichment using fields from the NCIM topology database

Bespoke event enrichment using fields from the NCIM topology database RCA event enrichment using the RCA plugin Customizable event enrichment via plugins (e.g. zNetView)

Page 35: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

35

Additional tasks beyond enrichment, for example Disco, which actually invokes dynamic discovery based upon receipt of certain events

RCA suppresses events with the aim of identifying the root cause event so that it can be rapidly addressed by the customer. Some customers have emphasized that it is not really the event that is of interest, but the entity that is key. Provided you can identify the entity that has the problem, problem resolution can be quickly performed in most cases. Specific probe customization to de-duplicate events on the same entity can be created rather than suppressing them. For a complete description of the Event Gateway plugins, see the section Plugin descriptions in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/event/reference/nmip_evnt_plugindescriptions.html

Event maps Event enrichment is performed using event maps. The main function of an event map is to call a set of stitchers that perform topology lookup to determine the entity associated with the event and then enrich the event with topology data. The Event Gateway determines which event map to use based on the kind of event, as defined in the alerts.status EventId field. To utilize an event map:

The event meeds to be passed to an appropriate eventMap The gateway must successfully use that eventMap to match the event to an entity The plugin must have registered interest in that eventMap

If you choose to configure event map selection using the Event Gateway, then you must configure the Event Gateway config.precedence table. The config.precedence table is configured in the EventGatewaySchema.cfg configuration file. This file is located at $NCHOME/etc/precision/EventGatewaySchema.cfg. All events not explicitly assigned an event map and precedence in the configuration are handled by the default Network Manager event map, if passed to the Network Manager gateway. This allows basic levels of event enrichment, but does not include the event in RCA (Root Cause Analysis) calculations. An event can be matched to an interface without performing RCA, but generally, it is advised that only events directly involved in RCA be used in the calculations due to performance implications. Event maps should be selected based on the following criteria:

What data is available to identify the entity associated with this event ? This narrows down the available set of event maps. The event cannot be handled properly if the data expected by the event map is not available in the event in the expected format.

Is this a Problem (Type 1) event ? Only problem events are candidates for RCA.

Does the event indicate that the entity can no longer receive or transmit network packets ? Any event which identifies that network traffic is adversely affected is a candidate for RCA. It is worth considering an event map that will pass the event to the RCA engine.

Page 36: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

36

Can this event cause or be caused by another event ? If the event indicates a standalone failure, it is not a candidate for RCA.

Selection is driven further by the data in the event that can be used to identify the entity. This could be any device-specific information such as:

IP address SNPM sysName DNS name MAC address Interface identifiers (e.g. ifIndex, ifAlias, ifDescr)

The event fields containing such device-specific information are populated by the probe rules file. It may be possible to add limited additional data from the rules files when raising the event. Typical examples are:

LocalNodeAlias Node LocalPriObj LocalSecObj

For a complete description of event fields and enrichment, see the sction Event enrichment in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/event/concept/nmip_evnt_eventenrichment.html For a complete description of event maps, see the section Event maps in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/event/concept/nmip_evnt_eventmaps.html

Precedence Precedence indicates the importance of an event: the higher the precedence, the more important the event. Currently, precedence is only relevant when considering events on exactly the same entity. Furthermore, it is currently used only during RCA. Both the NmosEventMap event field and the config.precedence OQL table identify precedence. Higher precedence events will suppress a lower precedence events. Higher precedence should be used for events that are:

Lower down the protocol stack e.g. confirmation that a physical port has failed would be assigned a higher precedence than an IP-Layer problem on that interface.

Higher confidence of a specific event identifying a problem e.g. failure to ping an interface may be because the ICMP packet could not reach the interface. This could be due to a network problem between the polling station and the interface. An SNMP trap that explicitly states that a link has gone down is a more positive confirmation of a problem on the interface itself, or on its directly connected neighbor.

The table below shows the recommended values to use. Note also that the limits have special significance with respect to RCA:

Page 37: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

37

Value Meaning Example events

0 This event cannot cause other issues. During RCA, it cannot suppress other events, but it can itself be suppressed.

SYSLOG-cisco-ios-SYS-CPUHOG SYSLOG-cisco-ios-BGP-NOTIFICATION00

300 This is reserved for non-authoritative events which suggest but do not necessarily indicate a failure on the device. For example, failure to reach a device does not necessarily indicate a problem on that device - it could be caused by a problem between the polling station and the device.

probeping-icmptimeout SNPMTRAP-IEFT-OSFP-TRAP-MIB-osfplfStateChange

600 This is intended for protocol failures. Failures identified lower down the protocol stack should take higher precedence. For example, as OSPF runs over IP, an OSPF failure would be expected to have a lower precedence than an IP failure.

SNMPTRAP-IEFT-OSPF-TRAP-MIB-ospflfConfigError

900 Confirmed physical failures that indirectly imply a Link Down or Ping Fail (and most other events).

SNMPTRAP-cisco-CISCO-WIRELESS-IF-MIB-cwrTrapLinkQuality

910 Confirmed physical failures that directly indicate a Link Down or Ping Fail.

SNMPTRAP-linkDown SYSLOG-smc-switch-linkDown

10000 This event cannot be caused by other issues. During RCA, it cannot be suppressed by other events, but it can become root-cause, suppressing other events.

SYSLOG-cisco-ios-CI-SHUTDOWN SNMPTRAP-riverstone-RIVERSTONE-NOTIFICATIONS-MIB-rsEnvirHotSwapOut

For a complete description of precedence, see the section Precedence value in the IBM Tivoli Network Manager IP Edition Event Management Guide, or online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/event/concept/nmip_evnt_precedencevalue.html

Event gateway plugins Gateway plugins are used to perform the event enrichment. Typical ones to use are:

Name Description

RCA The RCA plugin performs Root Cause Analysis on events within a domain, using predefined rules. These suppress events on an entity by another event based on a number of rules:

Same entity suppression identifies the event of most interest on a given entity Contained suppression suppresses failures on a contained entity such as an

interface by failures on a parent entity such as a card, using the contents of the contains table

Connected suppression suppresses events on one end of a link with events from the remote end, using the Layer 2 and VLAN topology connections of the connects table

Isolated suppression suppresses events that are downstream of a failure, with respect to a polling station

SAE The SAE plugin and accompanying Tivoli Netcool/OMNIbus automation generate Service Affecting Events. These identify when an entity that participates in a collection of

Page 38: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

38

interest has been affected by an event.

Customized Customizable event enrichment via customized gateway plugins (eg zNetView) can be created to handle selected events. These are stitcher-based plugins, allowing OQL tables to be defined, NCIM topology data to be queried, and ObjectServer tables to be modified.SQL configuration is required to enable a custom plugin. It must be listed in the gwPluginTypes table, enabled in the gwPlugins table and have events of interest identified in the gwPluginEventMaps and gwPluginEventStates tables.

For a complete set of Plugin descriptions, see online at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.networkmanagerip.doc_3.9/itnm/ip/wip/event/reference/nmip_evnt_plugindescriptions.html

Page 39: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

39

Troubleshooting Problem Diagnosis Tip

How can I check if an event is handled ?

Does the event match the nco2ncp EventFilter in EventGatewaySchema ? If so, do the fields of the event contain the expected data ? Has the entity been successfully matched against an entity in the topology ?

If so, will a given plugin (e.g. RCA) see an event ?

If so, does the gateway know how to process the event ? Is the EventId of the event listed in the config.precedence inserts in the schema files ? Is the NmosEventMap field of the event populated ? The expected data format per event map is given in EventGatewaySchema The NmosObjInst field will contain the main node entity ID if so, otherwise the event will not be passed to plugins Use ncp_gwplugins.pl to list the event maps and states handled by the plugin e.g. ncp_gwplugins.pl -domain NCOMS -plugin RCA At -messagelevel info, all serial numbers passed to all plugins are logged

How can I create a new event map ?

Is the probe rules file accessible ? If so, use the NmosEventMap field of the alerts.status event when the event is raised. Add an entry to the config.precedence OQL table in the schema file, mapping the event EventId field to an eventMap name. Basic event enrichment is performed by the Event Gateway, but additional enrichment (or other action) can be performed using the optional Stitcher named in the event map. Some events (e.g. Network Manager health check events) do not correspond to topology entities

Page 40: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

40

How can I test and debug my event enrichment customization ?

Comment out ncp_g_event from the CtrlServices schema file, and run it independently on the command line - this allows it to be started and stopped as desired, such that the event handling can be closely monitored (requires the ncp_model process to be running). It is recommended to start with a minimal topology, containing entities that are to be related. Events can be fed into the ObjectServer from any source. It can be helpful to raise events that are to be correlated before starting the gateway as it will read them in and process them at startup.

What fields can I use to identify an entity ?

The default event maps use a limited number of alerts.status fields to match an event to an entity. By default, the LocalNodeAlias is used to identify the main node - the chassis that is or that contains the affected entity. Events missing this field will not be handled by Network Manager.

Where is my update ? If a field in the ObjectServer alerts.status table has not been updated as expected: Was an entity found ? What stitcher populates that field ? What type of entity was found ? Does the NCIM topology entity have the expected value populated ?

Is NmosObjInst non-zero ? Is that stitcher triggered by the event map for this event ? Modify the stitcher if not Check the entityType of the NmosEntityId found Some fields are available only for chassis, some only for interfaces, etc. Some values can be NULL

Page 41: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

41

Why was no entity found ?

Did the event map expect to match an entity ? If so, is the expected entity in the NCIM topology cache ? Is the expected entity in the NCIM topology database ?

The event map stitcher field is required to do a topology lookup Only network events are matched to entities e.g. Network Manager Status events do not correspond to entities Check the cache e.g. ncp_oql -domain NCOMS -service EventGateway select * from ncimCache.entityData where <filter>; Check the database e.g. ncp_oql -domain NCOMS -service ncim -username ncim select * from entityData where <filter>; Check there is no mismatch between the NCIM topology data and the cache (the data should always be consistent)

Page 42: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

42

Chapter 5 General Performance Considerations

Platform considerations

Processor overview Network Manager can utilize a full range of processors (e.g. single, dual core, quad core). Providing the Operating System can dispatch work to the processor, processor core, or hyper thread processor, it is available for use by Network Manager. An important consideration is to match processor performance to the requirements of the Network Manager system.

Processor speed Network Manager does not present a large number of processes and they tend to be long-running and require significant resources. As a result of this, the first key factor to consider when choosing processors for use with Network Manager is the core speed. This is especially the case for systems dedicated to Network Manager, with a single domain and a single poller.

Multi-core processors For heavy workloads like discovery, polling, reporting, data exchange with Network Manager component products such as Tivoli Netcool/OMNIbus, and data exchange with other Tivoli products, the availability of two or more processor cores are strongly recommended and/or required. In settings where multiple domains are in use, multiple pollers within a single domain, etc., cases where there are multiple copies of key Network Manager processes running, it is helpful to have multiple processors and more that four processors. In these cases, the processor count is more important than the next processor speed upgrade.

Processors summary Select a system with at least two processor cores for single server deployments. For most single domain, poller and Network Management instance, choose faster clock speeds over adding additional cores. For multiple domains, pollers and Network Manager instances, choose additional cores over a faster clock speed.

Health check Performing basic health checks are important in ensuring the smooth operation of the pollers.

CPU usage If the CPU consumption for a poller process (as measured by CPU measurement tools e.g. top) suggest the process is overloaded, or polling intervals are missed (due to retry issues or poller capacity), it is time to consider additional pollers or even domains.

Page 43: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

43

Tracing routes Tracing the route to devices in the network map to check network paths is an important step in examining problem targets. To perform this procedure, you must be in the Network Views or in the Network Hop View. From the Network Hop View or Network Views network map, select the device to which to trace the route. To select multiple devices, press Ctrl. Right-click one of the selected devices and choose WebTools > Advanced Traceroute. The results of the traceroute operation appear in one or more separate browser windows. It is also possible to perform a custom traceroute by customizing the traceroute settings.

Specialist scripts Specialist scripts can also be produced that perform more specific repetitive tasks that are applicable to your architecture and configuration.

Monitoring process status messages You can view status messages from Network Manager to understand the health and status of the product. The Network Manager processes send messages to IBM Tivoli Netcool/OMNIbus when they start and stop. You can view these messages to see which processes have started and stopped, and to see failover status. To view process status messages, complete the following tasks:

Add an Active Event List (AEL) portlet to a page Apply a filter to the AEL so that only events with an Alert Group of Network

Manager Status are displayed

Routine performance data sampling Once deployed and active, it is very important to establish a program of collection of performance data. This allows one to view the state of the server(s) over time, consider the impact of changes to the managed network, and monitor the health of the system(s). The Linux function top or Solaris prstat are simple ways to view the system activity quickly.

Component failure Information If a process fails, for example the poller is shown as “Failed” from an itnm_status command. The recommended approach is to use the ffdc.sh script to collect the desired files (core, logs, etc). 1. Source the env.sh script 2. Change directory to $NCHOME/PD/precision 3. To see options, type ./ncp_ffdc.sh -h

For problems with the ncp_poller for example, run ncp_ffdc.sh -g MON 4. Resultant files will be placed in a directory off $NCHOME/PD/ 5. Debug level changes are made in This debug level change is made in the (this is the

default install tree) /opt/IBM/tivoli/netcool/CtrlServices.cfg file, for

Page 44: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

44

example Level 1 debug + INFO message mode to confirm target counts and successful polling

6. To implement debug level changes, restart Network Manager 7. Observe

/opt/IBM/tivoli/netcool/log/precision/ncp_poller.NCOMS.log file for poller activation, target counts, and any issues. Note that if one has multiple domains active, that the files would be named like CtrlServices.NCOMS10.cfg and ncp.poller.NCOMS10.log. The default and first domain does not have its name in the files, but extra domains do have their names included

Performance reports Out of the box performance reports allow you to view any historical performance data that has been collected by the monitoring system for diagnostic purposes. View trend and topN charts for data to gain insight on short term behaviors. The trending report to see the average values collected for of a list of selected devices and drill-down to see the trend over time for that data item. Trending is important to highlight issues that develop over a progressive period of time and are not temporary. Issues identified may need to be addressed via capacity planning.

Log file sizes As a part of the DEBUG and troubleshooting process it is important to have suitable log file sizes to store all the relevant information that may be required for further review and investigation. This is especially the case if detailed tracing and so on is carried out. Disk space availability can be an issue; however a best practice recommendation would be to set the NDE_LOGFILE_MAXSIZE to 1GB. It is highly unlikely you would ever, in normal circumstances, reach a point where all of the logs and traces reached this size and inadvertently filled up your file system.

Page 45: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

45

Notices This information was developed for products and services offered in the U.S.A.

IBM® may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing

IBM Corporation

North Castle Drive

Armonk, NY 10504-1785

U.S.A.

For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia Corporation

Licensing

2-31 Roppongi 3-chome, Minato-ku

Tokyo 106-0032, Japan

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:

Page 46: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

46

IBM Corporation

958/NH04

IBM Centre, St Leonards

601 Pacific Hwy

St Leonards, NSW, 2069

Australia

IBM Corporation

896471/H128B

76 Upper Ground

London

SE1 9PZ

United Kingdom

IBM Corporation

JBF1/SOM1 294

Route 100

Somers, NY, 10589-0100

United States of America

Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.

The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.

All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary.

This information is for planning purposes only. The information herein is subject to change before the products described become available.

Page 47: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

47

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

If you are viewing this information softcopy, the photographs and color illustrations may not appear.

Trademarks IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Adobe, Acrobat, Portable Document Format (PDF), PostScript, and all Adobe-based trademarks are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Other company, product, or service names may be trademarks or service marks of others.

Page 48: IBM Tivoli Network Manager IP Edition 3.9 Best Practices v1.0

Licensed Materials - Property of IBM

IBM Tivoli Network Manager IP Edition 3.9 Best Practices © Copyright IBM Corporation 2011, 2012.

48

Licensed Materials - Property of IBM