troubleshooting eigrp networksd2zmdbbm9feqrf.cloudfront.net/2015/usa/pdf/brkrst-2331.pdf · •the...

Troubleshooting EIGRP Networks

Steven Moore

BRKRST-2331

• Scope

• Questions

• Additional Materials

• CiscoLive.com

• Additional Courses

• BRKRST-2336: EIGRP Deployment in Modern Networks

• BRKRST-3336: WAN Virtualization using Over-the-Top (OTP)

• BRKARC-2002: Techniques of a Network Detective

Troubleshooting EIGRP Networks - Housekeeping

• Troubleshooting Methodology

• Troubleshooting the Essentials

• Neighbor Formation

• Route Computation & Propagation

• Advanced Issues

• Resource Depletion

• High Speed Links

• IPv6 Specifics

Troubleshooting EIGRP Networks

Troubleshooting Methodology


• Knowledge of the System• Expected Behaviors / Baseline

• Relationships

• Where to look for information!

• Logical Sequence of Events

• EIGRP: • Peers Form

• Routes Exchanged

• Path Computation (DUAL)

• Routing Table Updated (if necessary)

• Peers Updated (if necessary)

A

B


• Peer Issues

• Show ip eigrp neighbor

• Show log (with log-neighbor-changes enabled)

• Debug ip eigrp packet (with care)

• Path Computation and Propagation Issues

• Show ip eigrp event

• Show ip eigrp topology

• Info for TAC

• Show ip eigrp plugin detail

• Show eigrp tech detail

• Show run | s router eigrp

Where to Look for Information – Cheat Sheet

Show Commands

• The most useful command for checking neighbor status is show ipeigrp neighbors

• Some of the important information provided by this command are

• Hold time—time left that you’ll wait for an EIGRP packet from this peer before declaring him down

• Uptime—how long it’s been since the last time this peer was initialized

• SRTT (Smooth Round Trip Time)—average amount of time it takes to get an Ack for a reliable packet from this peer

• RTO (Retransmit Time Out)—how long to wait between retransmissions if Acks are not received from this peer

Neighbors

RtrA#show ip eigrp neighbors

IP-EIGRP neighbors for process 1

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

2 10.1.1.1 Et0 12 6d16h 20 200 0 233

1 10.1.4.3 Et1 13 2w2d 87 522 0 452

0 10.1.4.2 Et1 10 2w2d 85 510 0 3

Seconds Remaining Before Declaring Neighbor Down

How Long Since the Last Time Neighbor Was Discovered

How Long It Takes for This Neighbor to Respond to Reliable Packets

How Long We’ll Wait Before Retransmitting if No Acknowledgement

NeighborsShow IP EIGRP Neighbors

rtr302-ce1#show ip eigrp neighbor detail


H Address Interface Hold Uptime SRTT RTO Q Seq Type

(sec) (ms) Cnt Num

1 17.17.17.2 Et1/0 14 00:00:03 394 2364 0 124

Version 12.0/1.2, Retrans: 0, Retries: 0

Stub Peer Advertising ( CONNECTED SUMMARY ) Routes

0 50.10.10.1 Et0/0 13 04:04:39 55 330 0 13


Neighbors

• The big brother of the show ip eigrp neighbor command; some of the additional information available via the detailed version of this command include

• Number of retransmissions and retries for each neighbor

• Version of Cisco IOS and EIGRP

• Stub information (if configured)

Show IP EIGRP Neighbor Detail

Show Commands

• Show eigrp plugin detail gives information about all of the EIGRP capabilities and what version each is running; EIGRP has gone to a plug-in model for packaging features (similar to Windows DLL, sort of) and a particular platform/branch may have different plugins or different version of those plugins

• EIGRP is also what’s known as a True Component is done in a portable way, so the first line is critical to know which version of the True Component you’re running

Show EIGRP Plugin Detail

Show Commands

• This command may or may not exist in the version you’re running, since it was recently added

• While this command may not be all that useful to you, the TAC will probably asking you for it to identify exactly the version and capabilities of EIGRP on your router

Show EIGRP Plugin Detail

r3#sh eigrp plugin detail

EIGRP feature plugins:::

eigrp-release : 5.01.00 : Portable EIGRP Release

: 2.00.03 : Source Component Release(Portable EIGRP Release(rel5_1))

igrp2 : 3.00.00 : Reliable Transport/Dual Database

external-client : 1.02.00 : Service Distribution Client Support

bfd : 1.01.00 : BFD Platform Support

ipv4-af : 2.01.01 : Routing Protocol Support

ipv4-sf : 1.01.00 : Service Distribution Support



snmp-agent : 1.01.01 : SNMP/SNMPv2 Agent Support

r3#

Show Commands

• This gives you the plugins again, with a lot of other internal info; TAC will probably be asking for this one, too

Show EIGRP Tech

r3#show eigrp tech



: 2.00.03 : Source Component Release(Portable EIGRP

Release(rel5_1))



bfd : 1.01.00 : BFD Platform Support





snmp-agent : 1.01.01 : SNMP/SNMPv2 Agent Support

EIGRP Internal Process States

procinfoQ: 2

deadQ:

ddbQ: 2

EIGRP-IPv4 Protocol for AS(1)

{vrid:1 afi:1 as:1 tableid:0 vrfid:0 tid:0 name: }

PIDs: Hello: 26 PDM: 25

Router-ID: 172.18.176.153

Threads: procinfo: 0x18DD1A0 ddb: 0x18DD380

workQ:

iidbQ:

temp_iidbQ:

passive_iidbQ:

peerQ:

static_peerQ:

suspendQ:

networkQ: 1

summaryQ:

Socket Queue: 0/2000/0/0 (current/max/highest/drops)

Input Queue: 0/2000/0/0 (current/max/highest/drops)

GRS/NSF: enabled hold-timer: 240

Active Timer: 3 min

Distance: internal 90 external 170

Max Path: 4

Max Hopcount: 100

Variance: 1

Show CommandsShow EIGRP Tech

-

EIGRP-IPv6 Protocol for AS(1)

{vrid:0 afi:2 as:1 tableid:0 vrfid:0 tid:0 name: }

PIDs: Hello: (no process) PDM: (no process)

Router-ID: 172.18.176.153

Threads: procinfo: 0x185ECA0 ddb: 0x7B45B940

workQ:

iidbQ:

temp_iidbQ:

passive_iidbQ:

peerQ:

static_peerQ:

suspendQ:

summaryQ:

Socket Queue: %EIGRP(ERROR): invalid socket


Active Timer: 3 min


Max Path: 16

Max Hopcount: 100

Variance: 1

r3#

Show CommandsShow EIGRP Tech

Show Commands

• Show eigrp tech has tons of internal information that may seem meaningless; there may be nuggets you can glean from it (summaryq entries, etc.), but mainly I wanted to expose you to it so that it won’t be completely foreign to you when TAC asks you for it

• Again, this is a new command and may not exist in the version you’re running—if it’s not available now, it will be available in a later upgrade

• Note that this is also a “show eigrp tech detail” with even more undecipherable info

Show EIGRP Tech

Show Commands

• To make commands consistent across address/service families, changing the form of the Show commands

• EIGRP will become the second argument followed by the address-family

Show IP EIGRP commands changing in the future

Router#show eigrp ?

address-family EIGRP address-family show commands

plugins EIGRP feature plugin installed

protocols Show EIGRP protocol info

service-family EIGRP service-family show commands

tech-support Show EIGRP internal tech support

information

Router#show eigrp address-family ?

ipv4 EIGRP IPv4 Address-Family

ipv6 EIGRP IPv6 Address-Family

Router#show eigrp address-family ipv4 ?

<1-65535> Autonomous System

accounting Prefix Accounting

events Events logged

interfaces interfaces

multicast Select a multicast instance

neighbors Neighbors

timers Timers

topology Select Topology

traffic Traffic Statistics

vrf Select a VPN Routing/Forwarding

instance

Show Commands

• We’ve had a bit of an issue over the years with inconsistencies in the command structure in EIGRP. Sometimes, the IPv4 commands and the IPv6 commands were slightly different

• With the addition of service-family command (SAF, not covered in this presentation), we decided to make things consistent. Unfortunately, this also meant that things needed to be different

• Both the old and new forms of the show commands are currently supported, but in the future as new features are rolled out, some commands may only be available in the new command format

• You might as well start getting used to them now, if you’re running a version that supports the new format!

Show IP EIGRP command changes

Show Commands

RtrB#show ip eigrp traffic

IP-EIGRP Traffic Statistics for AS 1

Hellos sent/received: 574/558

Updates sent/received: 5/7

Queries sent/received: 2/2

Replies sent/received: 2/2

Acks sent/received: 11/7

Input queue high water mark 2, 0 drops

SIA-Queries sent/received: 1/1

SIA-Replies sent/received: 1/1

Hello Process ID: 64

PDM Process ID: 63

Show IP EIGRP Traffic

Show Commands

• Show ip eigrp traffic can be very useful to see what kind of activity has been occurring on your network; Some of the most interesting information includes:

• Input queue high water mark—this shows how many packets have been queued inside of the router to be processed—when packets are received from the IP layer, EIGRP accepts the packets and queues them up for processing; if the router is so busy that the queue isn’t getting serviced, the queue could build up—unless there are drops, there is nothing to worry about, but it can give you an indication of how hard EIGRP is working

• SIA-queries sent/received—this is useful to determine how often the router has stayed active for at least one and one-half minutes (as mentioned in the earlier section on stuck-in-active routes; this number should be relatively low—if it’s not, it’s taking a bit of time for replies to be received for queries, and it might be worth exploring why


Show Commands

RtrA#show ip protocol

*** IP Routing is NSF aware ***

Routing Protocol is "eigrp 200"

Outgoing update filter list for all interfaces is not set

Incoming update filter list for all interfaces is not set

Default networks flagged in outgoing updates

Default networks accepted from incoming updates

EIGRP metric weight K1=1, K2=0, K3=1, K4=0, K5=0

EIGRP maximum hopcount 100

EIGRP maximum metric variance 1

Redistributing: eigrp 200

EIGRP NSF-aware route hold timer is 240s

EIGRP NSF enabled

NSF signal timer is 20s

NSF converge timer is 120s

Show IP Protocol

Show Commands

• There are many fields in the show ip protocol which are useful in the troubleshooting process

• Some of the most interesting include:

• Outgoing and incoming filter lists

• Variance setting

• Redistribution configured (note that if a router is not redistributing other protocols, it still shows that it is redistributing itself)

• NSF configuration and timers

Show IP Protocol

Show Commands

Automatic network summarization is not in effect

Address Summarization:

40.0.0.0/8 for Vlan301

Summarizing with metric 1536

Maximum path: 4

Routing for Networks:

40.80.0.0/16

192.168.107.0

Routing Information Sources:

Gateway Distance Last Update

(this router) 90 00:01:10

40.80.24.33 90 01:13:47

40.80.12.19 90 01:13:47

40.80.23.31 90 01:13:48

40.80.8.13 90 01:13:48


Show IP Protocol

Show Commands

• More show ip protocol information:

• Summarization defined (both auto and manual) along with the metric associated with each summary

• Max-path setting

• Network statements

• Distance settings

• Note that the routing information sources section is really useless for EIGRP, since it doesn’t use periodic update; we didn’t rip it out of the display, but there isn’t much useful information for EIGRP in this section

Show IP Protocol

Event Log

• The two primary weapons at your disposal are debugs and the event log; realize that the output of both debugs and the event log are cryptic and probably not tremendously useful to you (so why am I telling you about them?)

• There are times when the output of debugs or the event log is enough to lead you in a direction, even if you don’t really understand all that it is telling you; don’t expect to be an expert at EIGRP through the use of debugs or the event log, but they can help

• Don’t forget, debugs can kill your router—don’t do a debug if you don’t know how heavy the overhead is; I may tell you below about some debugs, but don’t consider this approval from Cisco to run them on your production network

• The event log is non-disruptive, so it is much safer; just display it and see what’s been happening lately

EIGRP Troubleshooting Tools

• On a busy, unstable network debugs can be hazardous to your network’s health

• Event log is non-disruptive—it’s already running

• (unless you turned it off)

• Defaults to 500 lines (configurable)

• EIGRP event-log-size <number of lines>

• Maximum event-log-size is half of available memory

• Most recent events at top of log by default

• Read from the bottom to top

• Later version support displaying the event log in reverse!

Event Log

• A separate event log is kept for each AS

• 500 lines are not very much; on a network where there is significant instability or activity, 500 lines may only be a second or two (or less) — you can change the size of the event log (if needed) by the command

• eigrp event-log-size <number of lines>• Recent IOS limits to half of available memory

• If number of lines set to 0, it disables the log

• You can clear the event log by typing

• clear ip eigrp event

• Most recent events are at the top of the log by default, so time flows from bottom to top

Event Log

• New parameters available for showing the event log

Event Log

Rtr2#show ip eigrp event ?

<1-4294967295> Starting event number

errmsg Show Events being logged

reverse Show most recent event last

sia Show Events being logged

type Show Events being logged

| Output modifiers

<cr>

• Three different event types can be logged

• EIGRP log-event-type [dual][xmit][transport]

• Default is dual—normally most useful

• Dual is FSM (decisions in finite state machine)

• xmit and transport are different aspects of actually sending packets to peers

• Any combination of the three can be on at the same time

• Work is in progress to add additional debug information to event log

Event Log

RtrA#show ip eigrp events

Event information for AS 1:

1 01:52:51.223 NDB delete: 30.1.1.0/24 1

2 01:52:51.223 RDB delete: 30.1.1.0/24 10.1.3.2

3 01:52:51.191 Metric set: 30.1.1.0/24 4294967295

4 01:52:51.191 Poison squashed: 30.1.1.0/24 lost if

5 01:52:51.191 Poison squashed: 30.1.1.0/24 metric chg

6 01:52:51.191 Send reply: 30.1.1.0/24 10.1.3.2

7 01:52:51.187 Not active net/1=SH: 30.1.1.0/24 1

8 01:52:51.187 FC not sat Dmin/met: 4294967295 46738176

9 01:52:51.187 Find FS: 30.1.1.0/24 46738176

10 01:52:51.187 Rcv query met/succ met: 4294967295 4294967295

11 01:52:51.187 Rcv query dest/nh: 30.1.1.0/24 10.1.3.2

12 01:52:36.771 Change queue emptied, entries: 1

13 01:52:36.771 Metric set: 30.1.1.0/24 46738176

Event Log

Debugs

• Remember—debugs can be dangerous!

• Use only in the lab or if advised by the TAC

• To make a little safer

• Logging buffered <size>

• No logging console

Debugs

• By enabling logging buffered and shutting off the console log, you improve your odds of not killing your router when you do a debug; still no guarantees

• We used to change the scheduler interval when we performed debugs in the TAC, as well; this command is version dependent, so I’m not going to give you syntax here

Debugs

• Use modifiers to limit the scope of route events or packet debugs

• Limit debugs to a particular neighbor

• debug ip eigrp neighbor AS address

• Limit debugs to a particular prefix

• debug ip eigrp AS network mask

Debugs

• Both packet debugs and route event debugs create so much output that you would have a hard time sorting through it for the pieces you care about; the two modifier commands above allow you to limit what the debug output will include

• “Debug ip eigrp neighbor AS address” will limit the output to those entries pertaining to a particular neighbor

• “Debug ip eigrp AS network mask” will limit the output to only those entries that pertain to the prefix identified

• Unfortunately, you have to enable the debug (packet or route events) prior to putting the modifier on, so you could kill your router before you are able to get the limits placed on the output; sorry, but that’s the way it is

Debugs

Debugs

RtrA#debug ip eigrp

IP-EIGRP Route Events debugging is on

RtrA#debug ip eigrp neighbor 1 10.1.6.2

IP Neighbor target enabled on AS 1 for 10.1.6.2

IP-EIGRP Neighbor Target Events debugging is on

RtrA#clear ip eigrp neighbor

P-EIGRP: 10.1.8.0/24 - do advertise out Serial1/2

IP-EIGRP: Int 10.1.8.0/24 metric 28160 - 256002560

IP-EIGRP: 10.1.7.0/24 - do advertise out Serial1/2


IP-EIGRP: Int 10.1.1.0/24 metric 28160 - 25600256

IP-EIGRP: Processing incoming UPDATE packet


Debug IP EIGRP (Route Events) – neighbor filtering

Debugs

• In this debug, we are looking at route events recorded when neighbors are cleared (in reality, the debugs produced were far, far more—this is only a snapshot of the debug); a modifier was included to limit the output to only the events related to a single EIGRP neighbor, 10.1.6.2

• Notice that the debug output doesn’t identify which neighbors are involved in any of the events; without knowing the address used in the modifier command, you really can’t tell which neighbors you are interacting with in the debug output

• This output is often useful when trying to determine what EIGRP thinks is happening when there are route changes in the network

Debug IP EIGRP (Route Events) – neighbor filtering

Debugs

RtrA#debug ip eigrp

IP-EIGRP Route Events debugging is on

RtrA#debug ip eigrp 1 10.1.7.0 255.255.255.0

IP Target enabled on AS 1 for 10.1.7.0/24

IP-EIGRP AS Target Events debugging is on

RtrA#clear ip eigrp neighbor



IP-EIGRP: Int 10.1.7.0/24 metric 20512000 20000000 512000


IP-EIGRP: Processing incoming UPDATE packet


Debug IP EIGRP (Route Events) – prefix filtering

Debugs

• Again, this is the output of debugging EIGRP routing events, this time modified to only display output related to a single prefix in the network; this modifier can be very useful when trying to troubleshoot a problem with a single prefix (or representative route)

Debug IP EIGRP (Route Events) – prefix filtering

Debugs

RtrA#debug eigrp packet ?

ack EIGRP ack packets

hello EIGRP hello packets

ipxsap EIGRP ipxsap packets

probe EIGRP probe packets

query EIGRP query packets

reply EIGRP reply packets

request EIGRP request packets

stub EIGRP stub packets

retry EIGRP retransmissions

terse Display all EIGRP packets except Hellos

update EIGRP update packets

verbose Display all EIGRP packet

Debug EIGRP Packet

Debugs

• This debug is used in a variety of problems and circumstances; debug eigrp packet hello is used to troubleshoot neighbor establishment/maintenance problems

• Debug eigrp packet query, reply, update, etc., are also often used to try to determine the process occurring when a problem occurs—be careful; I’ve crashed/hung more than one router by doing a debug on a router that was too busy

• Probably the most commonly used debug eigrp packet option is terse, which includes all of the above except hellos; an example follows on the next page

Debug EIGRP Packet

Debugs

RtrA#debug eigrp packet terse

EIGRP Packets debugging is on

(UPDATE, REQUEST, QUERY, REPLY, IPXSAP, PROBE, ACK, STUB)

EIGRP: Sending UPDATE on Serial1/0 nbr 10.1.1.2

AS 1, Flags 0x0, Seq 2831/1329 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1 serno 19707-19707





EIGRP: Received ACK on Serial1/0 nbr 10.1.1.2

AS 1, Flags 0x0, Seq 0/2831 idbQ 0/0 iidbQ un/rly 0/0 peerQ un/rely 0/1

EIGRP: Serial1/0 multicast flow blocking cleared

EIGRP: Received ACK on Serial1/1 nbr 10.1.2.2

AS 1, Flags 0x0, Seq 0/2832 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1

EIGRP: Serial1/1 multicast flow blocking cleared

Debug EIGRP Packet Terse

DebugsDebug IP EIGRP Notifications

RtrA#debug ip eigrp notifications

IP-EIGRP Event notification debugging is on

RtrA#clear ip route *

RtrA#

IP-EIGRP: Callback: reload_iptable

IP-EIGRP: iptable_redistribute into eigrp AS 1

IP-EIGRP: Callback: redist frm static AS 0 100.100.100.0/24

into: eigrp AS 1 event: 1

IP-EIGRP: Callback: redist frm static AS 0 200.200.200.0/24

into: eigrp AS 1 event: 1

Debugs

• This debug is used to debug problems between EIGRP and the routing/interface infrastructure; for example, if you’re having problems with redistribution, the actual place where EIGRP gets the redistributed routes is the RIB/routing table—these callbacks are the mechanism to signal changes between the routing infrastructure and EIGRP when routes come and go

• Callbacks are also used from the interface infrastructure as interfaces are shutdown/no shut, deleted, addresses added, etc.

• Any problem with EIGRP’s interaction with other parts of the system will probably come through this mechanism, thus this debug is your best bet

Debug IP EIGRP Notifications

Debugs

RtrA#debug eigrp fsm

EIGRP FSM Events/Actions debugging is on

RtrA#clear ip route *

RtrA#

DUAL: Find FS for dest 10.1.8.0/24. FD is 28160, RD is 28160

DUAL: 0.0.0.0 metric 28160/0 found Dmin is 28160


DUAL: 10.1.6.2 metric 21024000/2169856 found Dmin is 21024000

DUAL: RT installed 10.1.3.0/24 via 10.1.6.2


Debug EIGRP FSM (Finite State Machine)

Debugs

• Debug eigrp fsm is very, very similar to dual event log; since the dual event log is non-disruptive and this debug could certainly cause problems, I rarely use this debug in real life—sometimes it’s useful in conjunction with another debug to see how the different parts of EIGRP interact (debug ip packet, debug eigrp transport, and debug eigrp fsm to see how all the pieces fit together)

• FSM stands for Finite State Machine, which describes the behavior of DUAL, the path selection part of EIGRP

Debug EIGRP FSM

• Neighbors

• Route Computation and Propagation

• Understanding the Topology Table

• Stuck In Active Routes

• Summarization

• Redistribution

• Additional Propagation Issues

Troubleshooting the Essentials

• The most useful command for checking neighbor status is show ipeigrp neighbors

• Some of the important information provided by this command are

• Hold time—time left that you’ll wait for an EIGRP packet from this peer before declaring him down

• Uptime—how long it’s been since the last time this peer was initialized

• SRTT (Smooth Round Trip Time)—average amount of time it takes to get an Ack for a reliable packet from this peer

• RTO (Retransmit Time Out)—how long to wait between retransmissions if Acks are not received from this peer

Neighbors




(sec) (ms) Cnt Num

2 10.1.1.1 Et0 12 6d16h 20 200 0 233

1 10.1.4.3 Et1 13 2w2d 87 522 0 452

0 10.1.4.2 Et1 10 2w2d 85 510 0 3

Seconds Remaining Before Declaring Neighbor Down

How Long Since the Last Time Neighbor Was Discovered

How Long It Takes for This Neighbor to Respond to Reliable Packets

How Long We’ll Wait Before Retransmitting if No Acknowledgement

NeighborsShow IP EIGRP Neighbors

Outstanding Packets

Last Reliable Packet Sent

rtr302-ce1#show ip eigrp neighbor detail


H Address Interface Hold Uptime SRTT RTO Q Seq Type

(sec) (ms) Cnt Num

1 17.17.17.2 Et1/0 14 00:00:03 394 2364 0 124


Stub Peer Advertising ( CONNECTED SUMMARY ) Routes

0 50.10.10.1 Et0/0 13 04:04:39 55 330 0 13


Neighbors

• The big brother of the show ip eigrp neighbor command; some of the additional information available via the detailed version of this command include

• Number of retransmissions and retries for each neighbor

• Version of Cisco IOS and EIGRP

• Stub information (if configured)

Show IP EIGRP Neighbor Detail

RtrA# config terminal

Enter configuration commands, one per line. End with CNTL/Z.

RtrA(config) # router eigrp 1

RtrA(config-router) # eigrp log-neighbor-changes

RtrA(config-router) # logging buffered 10000

RtrA(config) # service timestamps log datetime msec

Neighbors

• EIGRP Log-Neighbor-Changes is on by default since 12.2(12)

• Turn it on and leave it on

• Best to send to buffer log

• EIGRP log-neighbor-changes is the best tool you have to understand why neighbor relationships are not stable. It should be enabled on every router in your network—CSCdx67706 (12.2(12)) made it the default behavior; as explained on the previous slide, the uptime value from show ip eigrp neighbors will tell you the last time a neighbor bounced, but not how often or why—with log-neighbor-changes on and logging buffered, you keep not only a history of when neighbors have been reset, but the reason why… absolutely invaluable

• Logging buffered is also recommended, because logging to a syslog server is not bulletproof; for example, if the neighbor bouncing is between the router losing neighbors and the syslog server, the messages could be lost—it’s best to keep these types of messages locally on the router, in addition to the syslog server

• It may also be useful to increase the size of the buffer log in order to capture a greater duration of error messages—you would hate to lose the EIGRP neighbor messages because of flapping links filling the buffer log; if you aren’t starved for memory, change the buffer log size using the command logging buffered 10000 in configuration mode

• The service timestamps command above puts more granular timestamps in the log, so it’s easier to tell when the neighbor stability problems occurred

Neighbors

Neighbor 10.1.1.1 (Ethernet0) is down: peer restarted

Neighbor 10.1.1.1 (Ethernet0) is up: new adjacency

Neighbor 10.1.1.1 (Ethernet0) is down: holding time expired

Neighbor 10.1.1.1 (Ethernet0) is down: retry limit exceeded

Others, but not often

Neighbors

• So this tells us why the neighbor is bouncing—but what do they mean?

• Hint: peer restarted means you have to ask the peer; he’s the one that restarted the session

Log-Neighbor-Changes Messages

• Peer restarted—the other router reset our neighbor relationship; you need to go to him to see why it thought our relationship had to be bounced

• New adjacency—established a new neighbor relationship with this neighbor; happens at initial startup and after recovering from a neighbor going down

• Holding time expired—we didn’t hear any EIGRP packets from this neighbor for the duration of the hold time; this is typically 15 seconds for most media (180 seconds for low-speed NBMA)

• Retry limit exceeded—this neighbor didn’t acknowledge a reliable packet after at least 16 retransmissions (actual duration of retransmissions is also based on the hold time, but there were at least 16 attempts)

Log-Neighbor-Changes Messages

Holding Time Expired

• The holding time expires when an EIGRP packet is not received during hold time

• Typically caused by congestion or physical errors

• Ping the multicast address (224.0.0.10) from the other router

• If there are a lot of interfaces or neighbors, you should use extended ping and specify the source address or interface

Neighbor 10.1.1.1 (Ethernet0) is down:

holding time expired

A

B

Hello

Ping 224.0.0.10

Hello


peer restarted

• When an EIGRP packet is received from a neighbor, the hold timer for that neighbor resets to the hold time supplied in that neighbor’s hello packet, then the value begins decrementing

• The hold timer for each neighbor is reset back to the hold time when each EIGRP packet is received from that neighbor (long ago and far way, it needed to be a hello received, but now any EIGRP packet will reset the timer)

• Since hellos are sent every five seconds on most networks, the hold time value in a show ip eigrp neighbors is normally between 10 and 15 (resetting to hold time (15), decrementing to hold time minus hello interval or less, then going back to hold time)

• Why would a router not see EIGRP packets from a neighbor?

• He may be gone (crashed, powered off, disconnected, etc.)

• He (or we) may be overly congested (input/output queue drops, etc.)

• Network between us may be dropping packets (CRC errors, frame errors, excessive collisions)


RtrA# debug eigrp packet hello

EIGRP Packets debugging is on (HELLO)

19:08:38.521: EIGRP: Sending HELLO on Serial1/1

19:08:38.521: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0

19:08:38.869: EIGRP: Received HELLO on Serial1/1 nbr 10.1.6.2

19:08:38.869: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0

19:08:39.081: EIGRP: Sending HELLO on FastEthernet0/0

19:08:39.081: AS 1, Fags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0


Remember—Any Debug Can Be Hazardous

• Another troubleshooting tool available is to do the command “debug eigrppacket hello”; this will produce debug output to the console or buffer log (depending on how you have it configured) that will show the frequency of hellos sent and received

• You should make sure you have the timestamps for the debugs set to a value that you can actually see the frequency; something like:

• service timestamps debug datetime msec

• Remember that any time you enable a debug on a production router, you are taking a calculated risk; it’s always better to use all of the safer troubleshooting techniques before resorting to debugs—sometimes they’re necessary, however


• EIGRP sends both unreliable and reliable packets

• Hellos and Acks are unreliable

• Updates, Queries, Replies, SIA-queries and SIA-replies are reliable

• Reliable packets are sequenced and require an acknowledgement

• Reliable packets are retransmitted up to 16 times if not acknowledged

Retry Limit Exceeded

• Exceeding the retry limit means that we’re sending reliable packets which are not getting acknowledged by a neighbor—when a reliable packet is sent to a neighbor, it must respond with a unicast acknowledgement; if a router is sending reliable packets and not getting acknowledgements, one of two things are probably happening

• The reliable packet is not being delivered to the neighbor

• The acknowledgement from the neighbor is not being delivered to the sender of the reliable packet

• These errors are normally due to problems with delivery of packets, either on the link between the routers or in the routers themselves—congestion, errors, and other problems can all keep unicast packets from being delivered properly; look for queue drops, errors, etc., when the problem occurs, and try to ping the unicast address of the neighbor to see if unicasts in general are broken or whether the problem is specific to EIGRP



• Reliable packets are re-sent after Retransmit Time Out (RTO) • Typically 6 x Smooth Round Trip Time

(SRTT)

• Minimum 100 ms

• Maximum 5000 ms (five seconds)

• 16 retransmits takes between roughly 40 and 80 seconds

• If a reliable packet is not acknowledged before 16 retransmissions and the hold timer duration has passed, re-initialize the neighbor Neighbor 10.1.1.1 (Ethernet0) is down: retry limit exceeded

A

B

PacketAck


peer restarted

• The Retransmit Timeout (RTO) is used to determine when to retry sending a packet when an Ack has not been received, and is (generally) based on 6 X Smooth Round Trip Time (SRTT); the SRTT is derived from previous measurements of how long it took to get an Ack from this neighbor—the minimum RTO is 100 Msec and the maximum is 5000 Msec; each retry backs off 1.5 times the last interval

• The minimum time required for 16 retransmits is approximately 40 seconds (minimum interval of 100 ms with a max interval of 5000 ms); for example, If there isn’t an acknowledgement after 100 ms, the packet is retransmitted and we set a timer for 150 ms—if it expires, we send it again and set the timer for 225 ms, then 337 ms, etc., until 5000 ms is reached; 5000 ms is then repeated until a total of 16 retransmissions have been sent

• The maximum time for 16 retransmits is approximately 80 seconds, if the initial retry is 5000 ms and all subsequent retries are also 5000 ms


• If a reliable packet is retransmitted 16 times without an acknowledgement, EIGRP checks to see if the duration of the retries has reached the hold time, as well

• Since the hold time is typically 15 sec on anything but low-speed NBMA, it normally isn’t a factor in the retry limit; NBMA links that are T1 or less, however, wait an additional period of time after re-trying 16 times, until the hold-time period (180 seconds) has been reached before declaring a neighbor down due to retry limit exceeded

• This was done to give the low-speed NBMA networks every possible chance to get the Acks across before downing the neighbor

• Remember this if you modify the hold times!


Unidirectional Links

A

B

HelloUpdate



RtrA#

%DUAL-5-NBRCHANGE: IP-EIGRP 1: Neighbor 10.1.5.4 (Serial1) is down:

retry limit exceeded

RtrB#show ip eigrp neighbors


H Address Interface Hold Q Seq

Cnt Num

1 10.1.102.2 Et0 14 4 0

• In this example, we see what happens when a link is only working in one direction; unidirectional links can occur because of a duplicate IP address, a wedged input queue, link errors, or any other reason you can think of that would allow packets to be delivered only in one direction on a link

• RtrB doesn’t even realize that RtrA exists—RtrA is sending out his hellos, waiting for a neighbor to show up on the network; what it doesn’t realize is that the RtrB is already out there and trying to bring up the neighbor relationship

• RtrB, on the other hand, sees the hellos from RtrA, sends his own hellos and then sends an update to RtrA to try to get their topology tables/routing tables populated—unfortunately, since the updates are also not being received by RtrA, it of course isn’t sending acknowledgements; RtrB tries it 16 times and then resets his relationship with RtrA and starts over

• You’ll spot this symptom by the retry limit exceeded messages on RtrB, RtrB having RtrA in his neighbor table with a continual Q count, and RtrA not seeing RtrB, at all

• CSCdy45118 has been implemented to create a reliable neighbor establishment process (three-way handshake) and reliable neighbor maintenance (neighbor taken down more quickly when unidirectional link encountered). 12.2T, 12.3 and up

Unidirectional Links


• Ping the neighbor’s unicast address

• Vary the packet size

• Try large numbers of packets

• This ping can be issued from either neighbor; the results should be the same

• Common causes

• Mismatched MTU

• Unidirectional link

• Dirty link

RtrB# ping

Protocol[ip]:

Target IP address: 10.1.1.1

Repeat count [5]: 100

Datagram Size: 1500

Timeout in seconds[2]:

Extended commands[n]: y

....

A

B

• Some manual configuration changes can also reset EIGRP neighbors, depending on the Cisco IOS version

• Summary changes (manual and auto)

• Route filter changes

• Stub setting changes

• This is normal behavior for much older code

• CSCdy20284 removed many of these neighbor resets• Implemented in 12.2S, 12.3T, and 12.4 (approximately 2005)

• Mismatch of K-values (metric weights) will prevent peers from forming also (best just not to change them at all).

Manual Changes

• Summary changes

• When a summary changes on an interface, components of the summary may need to be removed from any neighbors reached through that interface; neighbors through that interface are reset to synch up topology entries

• Route filter changes

• Similar to summary explanation above; neighbors are bounced if a distribute-list is added/removed/changed on an interface in order to synch up topology entries

• In the past, we also bounced neighbors when interface metric info changed (delay, bandwidth), but we no longer do that (CSCdp08764)

• CSCdy20284 was implemented to stop bouncing neighbors when many manual changes occur; in late 12.2S, 12.3T, and 12.4, summary and filter changes no longer bounce neighbors

• Changing Stub setting or router-ids still resets peers! Remember to make these changes during maintenance windows

Manual Changes

Show Commands

• There is also a show ip eigrp interface which contains a subset of this info; You may want to just use that if you don’t need all the detail

• This command supplies a lot of information about how the interfaces are being used and how well they are obeying; some of the interesting information available via this command is:

• Retransmissions sent—this shows how many times EIGRP has had to retransmit packets on this interface, indicating that it didn’t get an ack for a reliable packet; having retransmits is not terrible, but if this number is a large percentage of packets sent on this interface, something is keeping neighbors from receiving (and acking) reliable packets

• Out-of-sequence rcvd—this shows how often packets are received out of order, which should be a relatively unusual occurrence; again, it’s nothing to worry about if you get occasional out-of-order packets since the underlying delivery mechanism is best-effort—if the number is a large percentage of packets sent on the interface, however, then you may want to look into what’s happening on the interface—are there errors?

• You can also use this command to see if an interface only contains stub neighbors and if authentication is enabled

Show IP EIGRP Interface Detail

Neighbors

RtrB#show ip eigrp interface detail

IP-EIGRP interfaces for process 1

Xmit Queue Mean Pacing Time Multicast Pending

Interface Peers Un/Reliable SRTT Un/Reliable Flow Timer Routes

Et0/0 1 0/0 737 0/10 5376 0

Hello interval is 5 sec

Next xmit serial <none>

Un/reliable mcasts: 0/3 Un/reliable ucasts: 6/3

Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 0

Retransmissions sent: 0 Out-of-sequence rcvd: 0

Authentication mode is not set

Et1/0 1 0/0 885 0/10 6480 0







Show IP EIGRP Interface Detail

• On a busy, unstable network debugs can be hazardous to your network’s health

• Event log is non-disruptive—it’s already running

• (unless you turned it off)

• Defaults to 500 lines (configurable) – per AS

• EIGRP event-log-size <number of lines>

• Maximum event-log-size is half of available memory

• Most recent events at top of log by default

• Read from the bottom to top

• Later version support displaying the event log in reverse

• Clear ip eigrp event resets the log buffer

Event Log

• A separate event log is kept for each AS

• 500 lines are not very much; on a network where there is significant instability or activity, 500 lines may only be a second or two (or less) — you can change the size of the event log (if needed) by the command

• eigrp event-log-size <number of lines>• Recent IOS limits to half of available memory

• If number of lines set to 0, it disables the log

• You can clear the event log by typing

• clear ip eigrp event

• Most recent events are at the top of the log by default, so time flows from bottom to top

Event Log

• Three different event types can be logged

• EIGRP log-event-type [dual][xmit][transport]

• Default is dual—normally most useful

• Dual is FSM (decisions in finite state machine)

• xmit and transport are different aspects of actually sending packets to peers

• Any combination of the three can be on at the same time

• Work is in progress to add additional debug information to event log

Event Log

The EIGRP Topology Table

• The topology table is probably the most critical structure in EIGRP

• Contains building blocks used by DUAL

• Used to create updates for neighbors

• Used to populate the routing table

• Understanding the topology table contents is extremely important in troubleshooting EIGRP problems

Topology Table

• One of the reasons that EIGRP is called an advanced distance vector protocol is that it retains more information than just the best path for each route it receives—this means that it can potentially make decisions more quickly when changes occur, because it has a more complete view of the network than RIP, for example; the place this additional information is stored is in the topology table

• The topology table contains an entry for every route EIGRP is aware of, and includes information about the paths through all neighbors that have reported this route to him—when a route is withdrawn by a neighbor, EIGRP will look in the topology table to see if there is a feasible successor, which is another downstream neighbor that is guaranteed to be loop-free; if so, EIGRP will use that neighbor and never have to go looking farther

• Contrary to popular belief, the topology table also contains routes which are not feasible; these are called possible successors and may be promoted to feasible successors, or even successors if the topology of the network were to change

• The following slides show a few different ways to look at the topology table and give hints on how to evaluate it

Topology Table

RtrA#sh ip eigrp topology sum

IP-EIGRP Topology Table for AS(200)/ID(40.80.0.17)

Head serial 1, next serial 1526

589 routes, 0 pending replies, 0 dummies

IP-EIGRP(0) enabled on 12 interfaces, neighbors present on 4 interfaces

Quiescent interfaces: Po3 Po6 Po2 Gi8/5

Internal data structures used to manage the topology table

Number of Replies to send from this router

Total number of routes in the local topology table

Topology TableShow IP EIGRP Topology Summary

Interfaces with No Outstanding Packets to Be Sent or Acknowledged

Topology Table

• Displays a list of successors and feasible successors for all destinations known by EIGRP

Show IP EIGRP Topology

Computed

distance

RtrA#show ip eigrp topology


..snip…..

P 10.200.1.0/24, 1 successors, FD is 21026560

via 10.1.1.2 (21026560/20514560), Serial1/0

via 10.1.2.2 (46740736/20514560), Serial1/1

Reported

distance

Successor

Feasible successor

Feasible distance

B

A

D

C E10.1.2.0/24 10.1.5.0/24

10.2

00.1

.0/2

4

.1

.1

.1

.1

.1

.1

.2

.2

.2

.2

.2

.2128K56K

Topology Table

• The most common way to look at the topology table is with the generic show ip eigrp topology command; this command displays all of the routes in the EIGRP topology table, along with their successors and feasible successors

• In the above example, the P on the left side of the topology entry displayed means the route is Passive—if it has an A, it means the route is Active; the destination being described by this topology entry is for 10.200.1.0 255.255.255.0—this route has one successor, and the feasible distance is 21026560; the feasible distance is normally the metric that would appear in the routing table if you did the command show ip route 10.200.1.0 255.255.255.0 (but not always)

• Following the information on the destination network, the successors and feasible successors are listed—the successors (one or more) are listed first, then the feasible successors are listed; the entry for each next-hop includes the IP address, the computed distance through this neighbor, the reported distance this neighbor told us, and which interface is used to reach him

• 10.1.2.2 is a feasible successor because his reported distance (21514560) is less than our current feasible distance (21026560) (It’s a smaller distance, and that means it is closer.)

Show IP EIGRP Topology

Topology Table

• Displays a list of all neighbors who are providing EIGRP with an alternative path to each destination

Show IP EIGRP Topology All-links

RtrA#show ip eigrp topology all-links


…..snip…..


via 10.1.1.2 (21026560/20514560), Serial1/0

via 10.1.2.2 (46740736/20514560), Serial1/1

via 10.1.3.2 (46740736/46228736), Serial1/2

Computed

distance

Reported

distance

Successor

Feasible successor

Feasible distance

Possible successor

B

A

D

C E10.1.2.0/24 10.1.5.0/24

10.2

00.1

.0/2

4

.1

.1

.1

.1

.1

.1

.2

.2

.2

.2

.2

.2128K56K

Topology Table

• If you want to display all of the paths which EIGRP contains in its topology table, use the show ip eigrp topology all-links command

• You’ll notice in the above output that not only are the successor (10.1.1.2) and feasible successor (10.1.2.2) shown, but another router that doesn’t qualify as either is also displayed; the reported distance from 10.1.3.2 (46228736) is far worse than the current feasible distance (21026560), so it isn’t feasible

• This command is often useful to understand the true complexity of network convergence—I’ve been on networks with pages of non-feasible alternative paths in the topology table because of a lack of summarization/distribution lists; these large numbers of alternative paths can cause EIGRP to work extremely hard when transitions occur and can actually keep EIGRP from successfully converging

Show IP EIGRP Topology All-links

Topology Table

• Displays detailed information for all paths received for a particular destination

Show IP EIGRP Topology <net><mask>

RtrA#show ip eigrp topology 10.200.1.0/24

IP-EIGRP topology entry for 10.200.1.0/24

State is Passive, Query origin flag is 1, 1 Successor(s), FD is 21026560

Routing Descriptor Blocks:

10.1.1.2 (Serial1/0), from 10.1.1.2, Send flag is 0x0

Composite metric is (21026560/20514560), Route is Internal

Vector metric:

....



Vector metric:

....



Vector metric:

....

B

A

D

C E10.1.2.0/24 10.1.5.0/24

10.2

00.1

.0/2

4

.1

.1

.1

.1

.1

.1

.2

.2

.2

.2

.2

.2128K56K

Topology Table

• If you really want to know all of the information EIGRP stores about a particular route, use the command show ip eigrp topology <network><mask>

• Note that the mask can be supplied in dotted decimal or /xx form

• In the above display, you’ll see that EIGRP not only stores which next-hops have reported a path to the target network, it stores the metric components used to reach the total (composite) metric

• You also may notice that EIGRP contains a hop count in the vector metrics—the hop count isn’t actually used in calculating the metric, but instead was included to limit the apparent maximum diameter of the network; in EIGRP’s early days, developers wanted to ensure that routes wouldn’t loop forever and put this safety net in place—in today’s EIGRP, it actually isn’t necessary any longer, but is retained for compatibility

Show IP EIGRP Topology <network><mask>

RtrA#show ip eigrp topology 30.1.1.0/24

IP-EIGRP topology entry for 30.1.1.0/24




....

External data:

Originating router is 64.1.4.14

AS number of route is 0

External protocol is Static, external metric is 0

Administrator tag is 0 (0x00000000)Static Route to 30.1.1.0/24 is

redistributed into EIGRP

Topology Table

• Showing the topology table entry for anexternal route shows additionalinformation about the route

External Topology Table Entry

B

A

D

C E10.1.2.0/24 10.1.5.0/24

10.2

00.1

.0/2

4

.1

.1

.1

.1

.1

.1

.2

.2

.2

.2

.2

.2128K56K

Topology Table

• If you perform the command show ip eigrp topology <network><mask> for an external route (one redistributed into EIGRP from another protocol), even more information is displayed

• The initial part of the display is identical to the command output for an internal (native) route—the one exception is the identifier of the route as being external; another section is appended to the first part, however, containing external information—the most interesting parts of the external data are the originating router and the source of the route

• The originating router is the router who initially redistributed the route into EIGRP—note that the value for the originating router is router-id of the source router, which doesn’t necessarily need to belong to an EIGRP-enabled interface; the router-id is selected in the same way OSPF selects router-ids, starting with highest IP address on a loopback interface, if any are defined, or using the highest IP address on the router if there aren’t loopback interfaces—note that if a router receives an external route and the originating router field is the same as the receiver’s router-id, it rejects the route—this is noted in the event log as ignored, dup router

• The originating routing protocol (where it was redistributed from) is also identified in the external data section; this is often useful when unexpected routes are received and you are hunting the source

External Topology Table Entry

Topology Table

• Zero successor routes are those that fail to get installed in the routing table by EIGRP because there is a route with a better admin distance already installed

Show IP EIGRP Topology Zero

RtrA#show ip eigrp topology zero


....

P 10.200.1.0/24, 0 successors, FD is Inaccessible

via 10.1.1.2 (21026560/20514560), Serial1/0

via 10.1.2.2 (46740736/20514560), Serial1/1

via 10.1.3.2 (46740736/46228736), Serial1/2

RtrA#show ip route 10.200.1.0 255.255.255.0

Routing entry for 10.200.1.0/24

Known via "static", distance 1, metric 0


* 10.1.1.2

Route metric is 0, traffic share count is 1

B

A

D

C E10.1.2.0/24 10.1.5.0/24

10.2

00.1

.0/2

4

.1

.1

.1

.1

.1

.1

.2

.2

.2

.2

.2

.2128K56K

Topology Table

• And last, the show ip eigrp topology zero command is available to display the topology table entries that are not actually being used by the routing table

• Typically, zero successor entries are ones that EIGRP attempted to install into the routing table, but found a better alternative there already; in our example above, when EIGRP tried to install its route (with an administrative distance of 90), it found a static route already there (with an administrative distance of one) and thus couldn’t install it—in case the better route goes away, EIGRP retains the information in the topology table, and will try to install the route again if it is notified that the static (or whatever) route is removed

• Routes that are active sometimes also show up as zero successor routes, but they are transient and don’t remain in that state

• This command isn’t often used or useful

Show IP EIGRP Topology Zero

Stuck in Active

• Indicates at least two problems

• A route went active

• It got stuck

• Both the ‘stuck’ and the ‘active’ have already occurred prior to the message being logged!

• Event-logs are your friend

Stuck-in-Active Routes (SIA)%DUAL-3-SIA: Route 10.1.1.0 255.255.255.0 stuck-in-active state in IP-EIGRP 100. Cleaning up

The Active Process

• RtrA loses its route to 10.10.10.0/24

• RtrA has no other path to this destination, so it marks the route as Active and sends a Query to RtrB

• RtrB receives this Query from its successor and has no other paths to reach the destination

• RtrB marks 10.10.10.0/24 as Active, and sends a Query to RtrC

10.10.10.0/24

A

B

C

Query

No other path

Query

The Active Process

• RtrC receives the Query and has no more neighbors to Query and no alternate paths to 10.10.10.0/24

• RtrC marks the route as unreachable, and sends a Reply to RtrB

• RtrB receives the Reply, marks 10.10.10.0/24 as unreachable, and sends a Reply to RtrA

• RtrA receives the Reply and since it didn’t learn any viable paths to reach 10.10.10.0/24, it deletes the route from the topology and routing tables

10.10.10.0/24

Query

No other path

Query

Reply

Reply

A

B

C

The Active Process

• What happens if RtrC ’s Reply isn’t sent, or doesn’t make it toRtrB?

• While RtrC is trying to send the Reply, RtrA ’s Active timer is running

• After 90 seconds, RtrA sends an SIA query to RtrB

• If RtrB is still waiting on RtrC , it sends an SIA reply to RtrA

10.10.10.0/24

No other path

Query

Reply

Query

Active Timer

SIA Query

SIA Reply

A

B

C

• SIA-queries are sent to a neighbor up to three times

• May attempt to get a reply from a neighbor for a total of six minutes

• If a Reply is not received by the end of this process, the route is considered stuck through this neighbor

• On the router that doesn’t get a reply after three SIA-queries

• Reinitializes neighbor(s) who didn’t answer

• Goes active on all routes known through bounced neighbor(s)

• Re-advertises to bounced neighbor all routes that were previously advertised

The SIA Query Process

• Sometimes the active process doesn’t complete normally; this can be due to a number of different problems which are covered later in this presentation… what happens when things go wrong?

• If RtrB doesn’t respond to RtrA within 1.5 minutes because it’s still waiting for a Reply from RtrC, RtrA will send an SIA-query to RtrB checking the status—if RtrB is still waiting for a Reply itself, it will respond to RtrA with an SIA-reply; this resets the SIA timer on RtrA so it will wait another 1.5 minutes

• Eventually, the problem keeping RtrC from responding to RtrB will take the neighbor relationship down between RtrB and RtrC, which will cause RtrB to reply to A, ending the Query process

The SIA Query Process

• Two (probably) unrelated causes of the problem — stuck and active

• Need to troubleshoot both parts

• Cause of active often easier to find

• Cause of stuck more important to find

Troubleshooting SIAs

• If routes never went active in the network, we would never have to worry about any getting stuck; unfortunately, in a real network there are often link failures and other situations that will cause routes to go active—one of our jobs is to minimize them, however

• If there are routes that regularly go active in the network, you should absolutely try to understand why they are not stable; while you cannot ensure that routes will never go active on the network, a network manager should work to minimize the number of routes going active by finding and resolving the causes

• Even if you reduce the number of routes going active to the minimum possible, if you don’t eliminate the reasons that they get stuck you haven’t fixed the most important part of the problem; the next time you get an active route, you could again get stuck

• The direct impact of an active route is small; the possible impact of a stuck-in-active route can be far greater

Troubleshooting SIAs

• Determine what is common to routes going active

• Known network problems?

• Flapping link(s)?

• From the same region of the network?

• Resolve whatever is causing them to go active (if possible)

Troubleshooting the Active Part of SIAs

• The syslog may tell you which routes are going active, causing you to get stuck. Since the SIA message reports the route that was stuck, it seems rather straight forward to determine which routes are going active. This is only partially true—once SIAs are occurring in the network, many routes will go active due to the reaction to the SIA; you need to determine which routes went active early in the process in order to determine the trigger

• Additionally, you can do show ip eigrp topology active on the network when SIAs are not occurring and see if you regularly catch the same set of routes going active

• If you are able to determine which routes are regularly going active, determine what is common to those routes—are links flapping (bouncing up and down) causing the routes (and everything behind it) to regularly go active?

• Are most or all of the routes coming from the same area of the network? If so, you need to determine what is common in the topology to them so that you can determine why they are not stable

Troubleshooting the Active Part of SIAs

• Show ip eigrp topology active

• Useful only while the problem is occurring

• If the problem isn’t occurring at the time, it is very difficult to find the reason the routes are getting stuck

Troubleshooting the Stuck Part of SIAs

• Our best weapon to use to find the cause of routes getting stuck-in-active is the command show ip eigrp topology active; it provides invaluable information about routes that are in transition—examples of the output of this command and how to evaluate it will be in the next several slides

• Unfortunately, this command only shows routes that are currently in transition; it isn’t useful after the fact when you are trying to determine what happened earlier—if you aren’t chasing it while the problem is occurring, there aren’t really any tools that will help you find the cause


Chasing Active Routes

Why Is RtrA Reporting SIA Routes?

Let’s Look at a Problem in Progress

RtrA Is Waiting on RtrB

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

RtrA#show ip eigrp topology active


A 20.1.1.0/24, 1 successors, FD is Inaccessible

1 replies, active 00:01:17, query-origin: Local origin

via Connected (Infinity/Infinity), Ethernet1/0

Remaining replies:

via 10.1.1.2, r, Ethernet0/0

• In our example network, we’ve noticed dual-3-sia messages in the log of RtrA and we know the trigger is an unstable network off of this router; instead of just shutting down the unstable link, we decide to try to determine the cause of the stuck part of stuck-in-active

• In the above output, we see that RtrA is active on the route 20.1.1.0/24 (note the “A” in the left column) and has been waiting for an answer from 10.1.1.2 (RtrB) for one minute and 17 seconds—we know that we are waiting on RtrB because of the lower case r after the IP address; sometimes, the lower case r comes after the metric in the upper part of the output (not under remaining replies)—don’t be fooled—the lower case r is the key, not whether it’s under the remaining replies are or not

• Since we know why we are staying active on the route because RtrB hasn’t answered us, we need to go to him (RtrB) to see why he’s taking so long to answer



So Why Hasn’t RtrB Replied?

RtrB#show ip eigrp topology active


A 20.1.1.0/24, 1 successors, FD is Inaccessible

1 replies, active 00:01:26, query-origin: Successor Origin

via 10.1.1.1 (Infinity/Infinity), Ethernet0/0

Remaining replies:

via 10.1.2.2, r, Ethernet1/0

RtrB is Waiting on RtrC

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

• We repeat the show ip eigrp topology active command on RtrB and we get the results seen above

• We see that RtrB probably isn’t the cause of our stuck-in-active routes, since it is also waiting on another router downstream to answer his query before it can reply; again, the lower case r beside the IP address of 10.1.2.2 tells us it is the neighbor slow to reply

• We now need to go to 10.1.2.2 (RtrC) and see why it isn’t answering RtrB

Chasing Active Routes (Step 1)


What’s RtrC’s Problem?

RtrC#show ip eigrp topology active


A 20.1.1.0/24, 1 successors, FD is Inaccessible, Qqr

1 replies, active 00:01:33, query-origin: Successor Origin, retries(1)

via 10.1.2.1 (Infinity/Infinity), Ethernet0/0, serno 20

via 10.1.3.2 (Infinity/Infinity), rs, q, Ethernet1/0, serno 19, anchored

RtrC Is Waiting on RtrD

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

• On RtrC we repeat the show ip eigrp topology active command and see what it thinks of the route

• Again, he’s waiting on another neighbor downstream to answer him before it can answer RtrB… you are probably getting the idea of how exciting this process can be; of course, in a real network you probably have users/managers breathing down your neck making it a bit more interesting

• As I’m sure you suspect our next step should be to see why 10.1.3.2 (RtrD) isn’t answering RtrC’s query



Why Isn’t RtrD Answering?

RtrD#show ip eigrp topology active


RtrD#

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

No active routes… back to RtrC!

• And again, we look at the active topology table entries, this time on RtrD

• Wait… RtrD isn’t waiting on anyone for any routes; did the replies finally get returned and the route is no longer active? We need to go back to RtrC and see if it is still active on the route



No; RtrC Is Still Waiting on RtrD; What’s the Deal?

RtrC#show ip eigrp topology active


A 20.1.1.0/24, 1 successors, FD is Inaccessible, Qqr

1 replies, active 00:01:52, query-origin: Successor Origin, retries(1)

via 10.1.2.1 (Infinity/Infinity), Ethernet0/0, serno 20

via 10.1.3.2 (Infinity/Infinity), rs, q, Ethernet1/0, serno 19, anchored

RtrC is waiting on RtrD

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

• Hmmm… RtrC still thinks the route is active and it’s gotten even older

• There appears to be a problem, Houston. RtrC thinks it needs a reply from RtrD, yet RtrDisn’t active on the route; we need to take a look at the neighbor relationship between these two routers to try to identify what is going wrong



We need to investigate and see why they don’t seem to agree about the active route

RtrC#show ip eigrp neighbors



(sec) (ms) Cnt Num

0 10.1.3.2 Et1/0 13 00:00:14 0 5000 1 0

1 10.1.2.1 Et0/0 13 01:22:54 227 1362 0 385

Looks like something’s broken between RtrC and RtrD

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

• It appears that RtrC is having a bit of a problem communicating with RtrD—the neighbor relationship isn’t even making it completely up based on the Q count on RtrC; we also notice in the log that the neighbor keeps bouncing due to retry limit exceeded

• Now we need to use our normal troubleshooting methodology to determine why these two routers can’t talk to each other properly



RtrC#ping 10.1.3.2

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 10.1.3.2, timeout is 2 seconds:

.....

Success rate is 0 percent (0/5)

Okay—we can’t ping; we need to fix this before EIGRP stands a chance of

working

20.1

.1.0

/24

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24

.1 .2 .1 .2 .1 .2

A B C D

• How does basic connectivity look? A ping between RtrC and RtrD isn’t succeeding either; we’ll need to find out why they can’t talk to each other

• Whatever is causing them to not talk to each other is undoubtedly a contributing factor to the SIAs we’re seeing in the network; we need to find and fix the problem with this link and remove the cause of the SIA routes


• It’s not always this easy to find the cause of an SIA

• Sometimes you chase the waiting neighbors in a circle

• If so, summarize and simplify

• Easier after CSCdp33034 (circa 2000)

• SIA should happen closer to the location of the cause of the problem


• Our example of chasing SIA routes was intentionally made very easy in order to demonstrate the tools and techniques—in a real event on a network, there would probably be many more routes active, and many more neighbors replying; this can make chasing the waiting neighbors significantly more challenging

• Usually, you will be able to succeed at tracking the waiting neighbors back to the source of the problem—occasionally, you can’t—on highly redundant networks, in particular, you can find yourself chasing neighbors in circles without reaching an endpoint cause of the waiting; if you run into this case, you may need to temporarily reduce the redundancy in order to simplify the network for troubleshooting and convergence


• Bad or congested links

• Query range is “too long”

• Excessive redundancy

• Overloaded router (high CPU)

• Router memory shortage

• Software defects (seldom)

Likely Causes for Stuck-in-Active

• Remember that the cause of the SIA route could be a different location than where the SIA message and bounced neighbors happened; this is particularly true with code older than CSCdp33034

• Some of the possible causes of SIAs are:

• Links that are either experiencing high CRC or other physical errors or are congested to the point of dropping a significant number of frames—queries, replies, or acknowledgements could be lost

• The time it takes for a query to go from one end of the network to the other is too long and the active timer expires before the query process completes; I don’t think I’ve ever seen a network where this is true, by the way

• The complexity in the network is so great due to excessive redundancy that EIGRP is required to work so hard at sending and replying to queries that it cannot complete them in time

• A router is low on memory so that it is able to send hellos, which are very small, but be unable to send queries or replies

• There have occasionally been software defects that caused SIAs (CSCdi83660, CSCdv85419, CSCtc31545)

Likely Causes for SIAs

• What is excessive redundancy? Isn’t redundancy a good thing, not something to avoid? I categorize excessive redundancy as alternative paths that exist in the network that provide little if any real benefit of improved reliability, and are often unplanned and unexpected

• In the above example, the four subnets on the left (which could be VLANs through a switch or any other media, for that matter) are there to provide users with access to the network; there are two routers connected to each VLAN in order to provide redundancy (probably via HSRP) so that the users will have failover capability if there is a problem

• Unfortunately, the designer may have created a network topology a little different than what it intended

Excessive Redundancy

Excessive Redundancy1.1.1.0/24

A

B

RtrA#show ip eigrp topo | begin 1.1.1.0


via Connected, Loopback1


...snip...

RtrA#show ip route | begin 1.1.1.0

C 1.1.1.0 is directly connected, Loopback1

...snip...

RtrA#show ip eigrp topo all | begin 1.1.1.0

P 1.1.1.0/24, 1 successors, FD is 128256, serno 2673915


via 10.0.19.2 (9690112/9173248), FastEthernet6/0.19








...snip...

Wow, Where did all of these alternative paths

come from!

RtrA

RtrB

• If you just define network statements under EIGRP covering all of your interfaces, each of the user subnets will be treated by EIGRP as possible alternative paths to reach every destination in the network! It is rarely the network designers goal to have these user subnets used as transit paths to reach other parts of the network for anyone other than the users that reside on that segment, but EIGRP doesn’t know what the designer intended, only what he/she configured

• As the output above shows, when something changes in the network EIGRP has to converge over each of the user subnets as part of the query path


Excessive Redundancy1.1.1.0/24

A

Brouter eigrp 1

passive-interface fastethernet6/0.1

passive-interface fastethernet6/0.2

Or

router eigrp 1

passive-interface default

no passive-interface fastethernet0/0

RtrA

• A simple solution to this particular problem is to use the passive-interface command. In EIGRP, defining an interface as passive means that the subnet on that interface is included in the EIGRP topology table and propagated to the rest of the network, but no peers will be formed across these interfaces; this means that they will not be in the transit path and will greatly simplify EIGRP’s apparent topology and the associated complexity of convergence

• If you don’t plan to have a link as a transit path, make it passive!

• Note that if you didn’t want those interfaces to show up in EIGRP at all, you could define more specific network statements to only cover the interfaces you’re interested in. Our assumption above is that those interfaces are needed in EIGRP, just not for peer formation.


• Decrease query scope (involve fewer routers in the query process)

• Summarization (filters out knowledge of the route to downstream peer)

• Route filters (filters out knowledge of the route to downstream peer)

• Define spoke/edge routers as stubs (explicitly defined as non-transit device)

• Run a Cisco IOS which includes CSCdp33034

Minimizing SIA Routes

• We’ve now talked about the impact that SIA routes can have on your network and how to track down the root cause of SIA events; while you may not be able to completely rid your network of SIA routes, there are techniques you can use to minimize your exposure

• Decrease query scope—in our example network, you saw the queries sent to each router in a chain; if a router receives a query on a route that it doesn’t have in its topology table, it immediately answers and doesn’t send the query onward—this is a very good thing; you do this through:

• Summarization—auto-summary (seldom used) or manual summary to summarize within a major network or to summarize external routes

• Route filtering—used to limit knowledge of routes; particularly on dual-homed remotes, which tend to reflect all routes back to the other leg of the dual home connection

• Use hierarchy—if the network doesn’t have hierarchy, the two techniques above cannot adequately be used

• Define spoke/edge routers as stubs so they aren’t queried at all

• Run a Cisco IOS with CSCdp33034 included

Minimizing SIA Routes

• Summary Metrics

• Summary Admin Distance

• Summary Black Holes

Summary Problems

Summary Basics

• Summarization is an information hiding technique to send less-specific routes to represent block of prefixes

• 192.168.1.0/24, 192.168.2.0/24, and 192.168.3.0/24 can be aggregated to 192.168.0.0/22

• Rather than advertising three networks with each representing 255 addresses (253 hosts), RtrA advertises a single network, representing 1024 addresses

192.168.3.0/24

192.168.2.0/24

192.168.1.0/24

253 Hosts

192.168.0.0/22

1 Network

1024 Addresses

3 Networks

255 Addresses Each

A

• Summarization is an information-hiding technique used to minimize the number of prefixes advertised while still maintaining full reachability—summarization will be most effective if the network is designed in a hierarchical way so that multiple prefixes can be represented at some point in the network by a single, less specific prefix; one typical place of summarization is from distribution routers toward spokes that only need to know a default route (or at least some subset of total routes) in order to reach the remainder of the network

• When summarization is used in EIGRP networks, scalability is greatly enhanced both because of the fewer number of prefixes known throughout the network as well as the decreased query scope that summarization brings; the query scope aspect will be explained in more detail later in this presentation

Summary Basics

Summary Metrics

• In EIGRP, the metric of a summary is based on the metrics of its components

• EIGRP chooses the metric of the lowest cost component route as the metric of the summary

A

B

10.1

.0.0

/24

Co

st 1

0

10.1

.1.0

/24

Co

st 2

0

10.2

.0.0

/24

Cost 10

10.2

.1.0

/24

Co

st 2

0

10.1.0.0/23

Cost 1010.2.0.0/23

Cost 10

C

• When EIGRP creates a summary route, it has to determine the metric to include with the route advertisement—EIGRP examines every entry in the database (topology table) looking for components of the summary that will be suppressed (thus represented by) the summary; EIGRP finds the component with the best composite metric and then copies the metric details from it (bandwidth, delay, etc.) into the summary topology table entry

• Note that it does not take the best delay, best bandwidth, etc., but takes the best composite metric and grabs the attributes from it.

• This works fine except for the fact that components of the summary may come and go, which means EIGRP has to continually make sure the summary is still using the lowest metric contained in a summary component

Summary Metrics

Summary Metrics• If the component from which the metric

was derived flaps, then summary updates are required as well!

• The summary is used to hide reachability information, yet changes to the metric information causes the routers beyond the summary to perform work to keep up with the metric changes

• There is also processing overhead for EIGRP to recalculate the summary metric each time a component changes

10.1

.0.0

/24

Co

st 1

0

10.1

.1.0

/24

Co

st 2

0

10.2

.0.0

/24

Cost 10

10.2

.1.0

/24

Co

st 2

0

10.1.0.0/23

Cost 10

10.2.0.0/23

Cost 10Cost 20

A

B C

• This recalculation of the summary metric when components change causes two significant things to happen:

• Every time the component with the best metric changes, the summary needs to be re-advertised to all of it’s peers—thus the desire to hide topology changes behind the summary is only partially functional; while it hides the changes for each component prefix, it still causes updates and processing if the best component is the one that changed

• Even if the best component isn’t the one that changed, EIGRP internally has to look at every topology table entry to make sure the summary metric wasn’t affected; with large numbers of components or large numbers of summaries, this can be significant processing

Summary Metrics

Summary Metrics—Solutions

• Use a loopback interface to force the metric to remain constant

• Create a loopback interface within the summary range with a lower metric than any other component

• Generally best to use a /32 for the prefix and use delay to force the metric value

• The summary will use the metric of the loopback, which will never go down

A

B

10.1

.0.0

/24

Co

st 2

0

10.1

.1.0

/24

Co

st 2

0

10.1.0.0/23

Cost 10

loopback 0

ip address 10.1.1.1 255.255.255.255

delay 1

• One way to minimize/remove the first problem (metric changing downstream due to component changes) is to create a loopback on the router doing the summarization and ensure that it has the best metric of any component of the summary; since it will remain up unless administratively shut down, the metric of the summary will not change in its updates to upstream peers

• Note that this approach does nothing to change the second summary metric issue; i.e., router cpu processing required to recalculate on the router doing the summarization—that’s next

Summary Metrics

Summary Metrics—Solutions

• In recent EIGRP code, you can define the “summary-metric” command in router mode inorder to specify the metric to be used on the summary, regardless of the metrics of the component routes

• This is similar to defining the metric on redistribution statements in router mode

• This eliminates metric churn downstream as well as local processing

router eigrp ROCKS

address-family ipv4 autonomous-system 1

network 10.0.0.0

topology base

summary-metric 10.1.0.0 255.255.254.0 10000 100 255 1 1500

A

B

10.1

.0.0

/24

Co

st 2

0

10.1

.1.0

/24

Co

st 2

0

10.1.0.0/23

Cost 10

• The recent implementations of EIGRP (release five and newer) contain the new “summary-metric” command under the router prompt which allows you to specify the metric to use on the summary so that learning the metric from summary components is unnecessary; since the metric is fixed, both the route churn problem for downstream peers and local database searching processing are removed

• This new command will greatly improve scalability in networks using summarization with large topology tables, which is where summarization is most useful!

Summary Metrics

Admin Distance Basics

• The administrative distance has nothing to do with distance but instead should be considered preference or believability

• If a route is being installed in the RIB from multiple sources, the admin distance defines which one wins

• The chart supplied here shows the default distances; important to this discussion is the fact that EIGRP summary routes are preferred (AD 5) over routes learned from EIGRP peers (AD 90 for internals, 170 for externals)

• Note: Lower = better

Route Source Default Distance

Connected Interface 0

Static Route 1

EIGRP Summary Route 5

eBGP 20

Internal EIGRP 90

IGRP 100

OSPF 110

IS-IS 115

RIP 120

On Demand Routing (ODR) 160

External EIGRP 170

iBGP 200

Unknown 255

Summary Admin Distance

• Original summary Admin Distance problem• Default summary is defined on

distribution routers to spokes

• Internet access point provides 0.0.0.0 external for default route to the Internet

• AD of five for the summary is better than the AD of 170 from the external EIGRP route, so the Internet default route is rejected!

• So let’s just add the AD to the summary!

A 200

A B

C

0.0.0.0

RtrA#sh ip route 0.0.0.0 0.0.0.0

Routing entry for 0.0.0.0/0, supernet

Known via "eigrp 1", distance 5, metric 25600, candidate default path, type internal

ip summary-address eigrp 1 0.0.0.0 0.0.0.0

• A common implementation of summaries in an EIGRP network is the generation of a default route (0.0.0.0/0) from the distribution layer to the access routers; this installs a 0.0.0.0/0 Null0 summary (discard) route on the distribution layer router

• The problem occurs when there is an actual 0.0.0.0/0 default route generated elsewhere in the network that the distribution layer needs to receive in order to do proper routing; since the 0.0.0.0/0 summary route has an AD of 5 and the 0.0.0.0/0 learned across the EIGRP network has an AD of 90 for internals or 170 for externals, the local summary wins and the received route is discarded

• In order to solve this problem, we added the ability to define a manual admin distance to the summary several years ago—this turned out to be a good idea only in limited deployment scenarios



• Since the summaries created on RtrA and RtrB now have a worse AD than the external route received, the external route wins and is propagated to RtrC

• Components of the summary are still suppressed

• RtrC has equal cost paths to 0.0.0.0/0

• But what happens if RtrA loses the path to the Internet?

D*EX 0.0.0.0/0 [170/409600] via 10.1.2.2, 00:00:10, Serial1/0

[170/409600] via 10.1.1.1, 00:00:10, Serial0/0

A

0.0.0.0

ip summary-address eigrp 1 0.0.0.0 0.0.0.0 200

A B

C

• When the summary is defined with an admin distance of 200 to make it worse than the external route learned across the EIGRP network, it works just like we intended; the summary is created internally to EIGRP so that the components are still suppressed to the access routers, but the external route received across the EIGRP network wins the installation into the RIB and is advertised to the access layer routers

• The problem occurs if the distribution layer router loses the 0.0.0.0/0 from the EIGRP network for some reason—since it’s the receipt of this route that keeps the local summary from being installed in the RIB and being advertised to the access layer peers, it’s disappearance changes things



• Since RtrA no longer has the external route, it creates the local summary route and advertises it to RtrC

• Now RtrC receives an external route from RtrB and an internal route from A

• The route from RtrA wins! Now RtrC’sdefault route points to A, who doesn’t have access to the Internet and maybe not even the company’s intranet! D* 0.0.0.0/0 [90/409600] via 10.1.1.1, 00:00:10, Serial0/0

0.0.0.0


A B

C

• When RtrA loses the 0.0.0.0/0 it received from the EIGRP network, it installs it’s local summary route into the RIB with an admin distance of 200 and happily advertises it to the access layer peers; this is the interesting part—summary routes only appear as a special route type on the router that generates the summary—on peers that receive the summary, it appears like any other internal route!

• That means that RtrC will receive an external route from RtrB with an AD of 170 and an internal route from RtrA with and AD of 90—the route from RtrA wins! The problem is that now RtrC points at RtrA for all of it’s traffic, yet RtrA no longer has access to the Internet and maybe not even the internal company network… drats!

• This problem doesn’t occur on single-homed access layer routers; if there’s only one path out from the access router, it doesn’t really matter if it’s internal or external


C

B


• How do you resolve this problem?

• Define the Admin Distance on the summary as 255 instead of something lower

• The route will only be advertised if the external exists!

• Note: Don’t use 255 for single-homed remotes!

B

D*EX 0.0.0.0/0 [170/409600] via 10.1.2.2, 00:00:10, Serial1/0

0.0.0.0


A B

C

• For dual-homed sites, using the summary admin distance in a slightly different way can provide a decent solution; if the summary is defined with an AD of 255, it will lose to the external route and allow that route to be propagated through to the access routers as before—if the external route disappears, however, the AD of 255 keeps the local summary from being installed in the RIB and thus it won’t be advertised to the access routers; the only remaining route will be the external through RtrB

• Note:• If you put an AD of 255 on a single-homed remote and the external default route goes away, the

access router will not receive the route at all!

• You may need a floating static on the access router to avoid this side-effect


C

AB


• Another way to resolve the problem

• Don’t use summary admin distance if sending to dual-homed remotes

• Instead, use distribute-list to permit 0.0.0.0 to remotes with floating static on remotes

AB

router eigrp 1

distribute-list 1 out Serial0/0

….

access-list 1 permit 0.0.0.0

ip route 0.0.0.0 0.0.0.0 10.1.1.1 200

ip route 0.0.0.0 0.0.0.0 10.1.2.2 200

0.0.0.0

A B

C

• An alternative solution is to not use summarization at all on the distribution layer and instead use a distribute-list out to permit only the 0.0.0.0/0 route to the access layer; of course, that means that if the 0.0.0.0/0 route disappears, the access layer routers are stranded and can’t reach the internal, company network… not good

• To avoid this problem, the distribute-list out on the distribution layer should be used in tandem with floating static routes on the access-layer routers


C

A B

Summary Admin Distance• Another situation encountered when

adding the admin distance to a summary• Same summary can be defined on multiple

interfaces

• Only one topology table/routing table entry is actually created

• The last defined Admin distance wins!



Router#sh ip route 0.0.0.0

Routing entry for 0.0.0.0/0, supernet

Known via "eigrp 1", distance 240, metric 128256, candidate default path, type internal

Router#show run int e0/0

interface Ethernet0/0

ip address 10.1.1.1 255.255.255.0


Router#show run int e0/1

interface Ethernet0/1

ip address 10.1.2.1 255.255.255.0

ip summary-address eigrp 1 0.0.0.0.0 0.0.0.0 240

• One other “interesting” aspect that may be unexpected when using the admin distance option on the summary commands is that while a summary can be entered on many different interfaces, only one summary topology entry and one routing table entry is actually created

• This means that there is really only one distance associated with the summary, regardless of how many different distances you enter for the same summary on different interfaces

• If you enter the same summary on different interfaces, it’s really only adding interfaces to a single summary queue entry; if each of these summaries has a different admin distance, the last one entered will be the one used


A

B

10.1

.2.0

/24

10.1

.3.0

/24

E

D

X


10.1

.1.0

/24

C

Summary Black Holes

• This network implements manual summarization from the distribution routers toward the core

• These summaries represent all spoke networks and links to the spokes

• It normally doesn’t matter whether RtrA or RtrB is used to reach an address on a spoke from X

RtrX#sh ip route | sec 10.1.0.0

D 10.1.0.0/16 [90/307200] via 10.2.1.52, 00:02:01, Ethernet0/0

[90/307200] via 10.2.1.51, 00:02:01, Ethernet0/0

• In this example network, the designer implemented manual summarization to hide the specific routes located at the remote sites by summarizing from the distribution layer toward the core; on each of the interfaces of RtrA and RtrB toward router X is the summary statement ip summary-address eigrp 1 10.1.0.0 255.255.0.0—this blocks the specific prefixes from being advertised to X, and instead only advertises the 10.1.0.0/16 prefix there

• Normally, this works great; minimal info is known at the core and proper routing takes place just fine—transitions in the remotes are hidden form the core, which makes the core more stable—but what happens if a problem occurs?

Summary Black Holes

Summary Black Holes

• What happens if the RtrA to RtrC Link fails?

• RtrX is still receiving the summary from both RtrA and RtrB and may chose RtrA as its path for packets going to 10.1.1.0/24

• A builds a discard route to null0 with an administrative distance of five when the summary is configured

• The traffic will be dropped at RtrA!

RtrA#sh ip route 10.1.1.0

Routing entry for 10.1.0.0/16

Known via "eigrp 1", distance 5, * directly connected, via Null0

RtrX#ping 10.1.1.53

.....

Success rate is 0 percent (0/5)

RtrX#

10.1

.1.0

/24

A

B

10.1

.2.0

/24

10.1

.3.0

/24

E

D

X

C


• The problem is that a summary will be installed and sent if any component of the summary exists; if a router doing summarization loses access to one or more of the components of the summary, it will still advertise reachability to the entire summary, even though packets destined to the missing component(s) cannot be delivered

• This isn’t a problem if the summarizing router is the only path the lost network, but often it’s not; in the diagram above, both RtrA and RtrB have access to the remotes and are summarizing them toward the core of the network—if RtrA loses access to 10.1.1.0/24, routers downstream (like rtrX) could send packets to RtrA that it cannot deliver; if the packets went to RtrB, however, they would have been delivered successfully

• Note that this problem is common to all routing protocols which can do summarization. Information hiding is a good thing, but there are possible down-sides, as well.

Summary Black Holes

GRE Tunnel, No Summarization

New Link, No Summarization

Summary Black Holes

• Possible Solutions

• Add a new link between RtrA and RtrBwithout summarization configured

• Add a GRE tunnel between RtrA and RtrB without summarization configured

RtrA#sh ip ei to all | sec 10.1.1.0

P 10.1.1.0/24, 1 successors, FD is 307200, serno 19

via 10.1.10.53 (307200/281600), Ethernet0/1

via 10.1.20.52 (1587200/307200), Ethernet1/0

A

B

10.1

.2.0

/24

10.1

.3.0

/24

E

D

X


10.1

.1.0

/24C

• The normal method to avoid/resolve this problem is to have another link between summarizing routers (another fast/gigEthernet, PVC, etc.) and not summarize across this link; in that way, RtrA would be getting component routes from RtrB and would know how to deliver packets to 10.1.1.0 through RtrB

• Another approach used if the cost of another link is too high is to put a GRE tunnel between RtrA and RtrB and allow all component routes to be advertised across this tunnel

• There’s also some discussion inside of EIGRP development of ways to solve this problem dynamically. A sys-wish bug has been filed against it (CSCdw68502) but we haven’t started the work yet; we’re still discussing the best way to solve it

Summary Black Holes

• Redistribution metrics

• Static route to connected interface

• Multiple points of redistribution

Redistribution Problems

• Redistribution is used to advertise routes learned in another routing protocol (or another EIGRP AS) into the EIGRP network; routes that are redistributed into EIGRP are considered less trustworthy then native EIGRP routes because of the loss of specific topology information/metrics—because they’re less trustworthy, they’re given a worse admin distance so that routes that are learned internally within the AS are preferred

• Redistribution is a fact of life in many networks, with the foreign route sources coming from suppliers, other divisions, other companies when there are mergers, etc.; redistribution isn’t evil, but it needs to be controlled so that the EIGRP network remains stable

• Another use of redistribution is for MPLS/VPN over BGP using PE-CE support; this creates its own interesting troubleshooting challenges

Redistribution Basics

Redistribution Metrics

• One of the most common problems with redistribution into EIGRP is when a redistribution metric isn’t defined

• RtrA is redistributing 10.1.1.0/24 from RIP, but RtrB and RtrC do not have the route installed

• The first thing to check is whether RtrA has a redistribution metric configured via either

• Default-metric <metric>

• Redistribute rip metric <metric>

• EIGRP can’t directly turn a hop count or cost into an EIGRP metric, so it won’t redistribute routes unless it knows what metric to assign to them

10.1.1.0/24 via RIP

router eigrp 100redistribute rip

What Metric Should I Use?

RtrC#show ip route 10.1.1.0RtrC#

A

C

BRtrB#show ip route 10.1.1.0B#

• As demonstrated, EIGRP must have a redistribution metric to use for the routes being redistributed; the two forms of supplying this redistribution metric serve similar, but not identical purposes—make sure you use the one that matches your requirements

• If you want to set the metric to be used for routes from a particular redistribution source, use the metric keyword on the redistribution statement

• router eigrp 1• redistribute rip metric 10000 100 255 1 1500

• redistribute ospf 1 metric 1000 200 255 1 1500

• If you want all redistribution sources to have the same metric applied to the redistributed routes, you can use the default-metric command instead

• router eigrp 1• redistribute rip

• redistribute ospf 1

• default-metric 10000 100 255 1 1500


• EIGRP will automatically derive the redistribution metrics from:

• A connected interface for redistribute connected

• The interface through which a static route is reached for redistribute static (note: this isn’t always reliable; you’re better off specifying the redistribution metric!)

• The metric of an EIGRP route from another AS

• If none are those are true, you must supply the metric for redistribution

• On the redistribute statement explicitly

• Or a default metric statement

• Just Configure it explicitly! Configure with Intent.


Static Route to Connected Interface

• Another surprise you could hit isn’t really a “problem” but could be unexpected

• A static route with the next-hop pointing to a local interface may be automatically redistributed

• This happens only if the destination network is covered by a network statement

router eigrp 1network 10.0.0.0

!Ip route 10.2.2.0 255.255.255.0 null0

r31#show ip eigrp topology 10.2.2.0/24IP-EIGRP (AS 1): Topology entry for 10.2.2.0/24State is Passive, Query origin flag is 1, 1 Successor(s), FD is 256Routing Descriptor Blocks:0.0.0.0, from Rstatic, Send flag is 0x0

Composite metric is (256/0), Route is InternalVector metric:Minimum bandwidth is 10000000 KbitTotal delay is 0 microsecondsReliability is 0/255Load is 0/255Minimum MTU is 1500Hop count is 0

r32#show ip route10.0.0.0/24 is subnetted, 3 subnets

C 10.1.2.0 is directly connected, Ethernet0/0D 10.2.2.0 [90/281600] via 10.1.2.31, 00:19:20, Ethernet0/0D 10.1.1.0 [90/307200] via 10.1.2.31, 00:24:19, Ethernet0/0

r31

r32

• One situation that isn’t necessarily a problem but creates calls to the TAC and questions on our internal email aliases is the apparently bizarre behavior that EIGRP will automatically redistribute a static route even if redistribute static isn’t configured in some circumstances—how could this not be a bug?

• When a static route is configured and the next-hop is a local interface (including null0) it sets a bit on the route identifying that it is pseudo-connected which requires things like ARP to reach hosts within the subnet

• Since the connected bit is set on the route, EIGRP picks it up if the destination of the route is also covered by a network statement; this has always been the case and is operating as designed

• One really unusual aspect of the route that redistributed is that it shows up as a redistributed internal so peers will see the route as an internal even though it’s redistributed… confusing, huh?

Static Route to Connected Interface

EIG

RP

OS

PF

Multiple Points of Redistribution

• A route is injected into EIGRP as an external; this route is redistributed into OSPF by RtrB

• The route is transmitted through OSPF to RtrA , who redistributes it back into EIGRP

• Depending on the manually set metrics, RtrB may prefer this redistributed route, building a routing loop

• Depending on the timing, the loop can be persistent or transient. Either way, a bad thing!

A

Metric 10 Metric 2816000

10.1.1.0/24

Metric

2688000


B

• As mentioned before, redistribution isn’t evil in itself, but a network designer needs to be particularly careful if there are multiple points of redistribution between routing protocols; also as mentioned before, a redistribution metric is normally supplied manually at the redistribution point—this artificial setting of the metric hides where the redistributed routes actually exist in the network

• Because of the loss of specific topology information due to resetting the metrics, suboptimal routing is likely if there are multiple points of redistribution—how can you know which redistribution point is closest to the actual destination? You can’t

• Not only that, it’s possible to create routing loops if the redistribution metrics at some places is better than the original metric; in the above example, a route that originates in EIGRP as an external that is then redistributed into OSPF and back into EIGRP could have a better metric at the inbound OSPF redistribution point than at the original redistribution point… broken


• There are three primary methods used to prevent this routing loop:

• Redistributing live routing information in only one direction

• Filtering routes based on the prefixes advertised to prevent feedback loop

• Filtering routes using routing tags to prevent feedback

Redistribution Design

• The possible routing loop previously discussed occurs because of the mutual redistribution between protocols at multiple points; if the redistribution occurs in only one direction, the invalid improvement in metric cannot occur

• One way to change this from a mutual redistribution scenario to one-way redistribution is to provide the routes in one direction either through summarization or through a redistributed static

• One direction uses dynamic redistribution and other direction is static


EIG

RP

OS

PF


• Redistribute a static in one direction, and between protocols in the other direction

• A route is injected into EIGRP as an external; this route is then redistributed into OSPF by RtrB

• The route is transmitted to RtrA through OSPF; the route is not redistributed back into EIGRP, since redistribution between OSPF and EIGRP is not configured

Live Routing Information in Only One Direction

router ospf 100

redistribute eigrp 100 metric 10

router eigrp 100

redistribute static metric 10000 1000 255 1 1500

10

.1.0

.0/1

6

10

.2.0

.0/1

6

A

BMetric 10 Metric 2816000

Metric 25

ip route 10.2.0.0 255.255.0.0 serial 0/0

10.1.1.0/24

Metric 2560256

• Another way to eliminate the routing loop is by filtering routes that originate in EIGRP from being relearned back into EIGRP from the OSPF-EIGRP redistribution point; in other words, if the route originated in EIGRP, we have no need to accept the route back into EIGRP after it’s been redistributed into OSPF

• This filtering can be done with distribute-lists if the prefixes involved are easily identified blocks—if not, we have other techniques



• Configure access/prefix lists which match the address ranges used in each section of the network and filter based on these ACLs

• The route is injected into EIGRP as an external; this route is then redistributed into OSPF by RtrB

• The route is transmitted through OSPF and reaches RtrA

• The route is now blocked by distribute list 20, which breaks the routing loop

Filtering Based on Prefixes

10

.2.0

.0/1

6

router ospf 100

redistribute eigrp 100 metric 10

distribute-list 10 out

router eigrp 100

redistribute ospf 100 metric 1000 1 255 1 1500

distribute-list 20 out

EIG

RP

OS

PF

A

B

access-list 10 permit 10.1.0.0 0.0.255.255

access-list 20 permit 10.2.0.0 0.0.255.255


Metric 25

10.1.1.0/24

Metric 2560256

10

.1.0

.0/1

6

• If the prefixes generated in each routing protocol are not in well-defined blocks that are easily specified in an access-list, filtering can also be done using route-tags

• The tags can be applied as a route is redistributed from EIGRP into OSPF (as well as OSPF into EIGRP)

• Filters can be set to deny the routes from re-entering the domains in which they originated

• This is a far more flexible filtering method since it doesn’t require that the routes being filtered be in well-defined blocks; any route that has the tag set will be matched in the filtering process


10

.2.0

.0/1

6


• Set route tags when redistributing between the protocols; deny tagged routes at the redistribution point

• The route is injected into EIGRP as an external; it is redistributed into OSPF by RtrB and a tag is set

• The route is transmitted to RtrAthrough OSPF

• The route is blocked from being redistributed into EIGRPbecause of the route tag

Filtering Based on Tags

router ospf 100

redistribute eigrp 100 metric 10 route-map usetags

router eigrp 100

redistribute ospf 100 metric 1000 1 255 1 1500 route-map usetags

EIG

RP

OS

PF

A

B

route-map usetags deny 10

match tag 1000

route-map usetags permit 20

set tag 1000


10.1.1.0/24

Metric 2560256

Metric 25

10

.1.0

.0/1

6

• Zero successor routes

• Duplicate Router-ID

Additional Propagation Problems

Zero Successor Routes

• Zero successor routes happen when EIGRP attempts to install a route in the RIB and it is rejected

• This normally occurs when there is a route in the RIB with a better admin distance than EIGRP

• EIGRP cannot propagate zero successor routes to peers

A

C

B

10.10.10.10/32

RtrA#show ip eigrp topologyP 10.10.10.10/32, 1 successors, FD is 128256


RtrB#show ip eigrp topologyP 10.10.10.10/32, 0 successors, FD is Inaccessible

via 10.1.1.30 (409600/128256), Ethernet0/0RtrB#show ip route static10.0.0.0/8 is variably subnettedS 10.10.10.10/32 [1/0] via 10.1.2.2

RtrC#show ip eigrp topology | incl 10.10.10.10RtrC#

• A route that shows up in the topology table with 0 successors and an FD of inaccessible is unusable by EIGRP for some reason; typically that means that we received the prefix from a peer and when we tried to install it into the RIB, the RIB rejected the installation

• Normally, the RIB rejects a route installation when there is already a better route in the table—remember our discussion about admin distance earlier in this presentation? Here’s what happens when we’re the loser with a worse admin distance

• One of the side-effects of a route being flagged as 0 successor/inaccessible is that we aren’t permitted to advertise a route to peers that we didn’t succeed at installing in the RIB; if we couldn’t put the route in the RIB, we can’t verify that the destination will be reachable, thus we can’t tell our peers about it



• Another case of zero successor routes happens with overlapping EIGRP Autonomous Systems

• If a prefix is known in both AS with the same AD, only one AS can install it in the RIB

• The EIGRP AS that failed to install it will not propagate the route to its peers

A

C

B

10.10.10.10/32

RtrA#show ip eigrp 1 topologyP 10.10.10.10/32, 1 successors, FD is 128256

via Connected, Loopback0RtrA#show ip eigrp 2 topologyP 10.10.10.10/32, 1 successors, FD is 128256


RtrB#show ip eigrp 1 topologyP 10.10.10.10/32, 1 successors, FD is 409600

via 10.1.1.30 (409600/128256), Ethernet0/0RtrB#show ip eigrp 2 topologyP 10.10.10.10/32, 0 successors, FD is Inaccessible

via 10.1.1.30 (409600/128256), Ethernet0/0

D

RtrD#show ip eigrp 2 topology | incl 10.10.10.10RtrD#

RtrC#sh ip eigrp 1 topologyP 10.10.10.10/32, 1 successors, FD is 435200

via 10.1.2.2 (435200/409600), Ethernet0/0

• Another case of the zero successor route problem is when there are overlapping EIGRP Autonomous Systems. Sometimes a network designer will define two EIGRP Autonomous Systems with the same network statement, covering the same interfaces and expect both of them to propagate the associated routes; this is often done during AS transition time when they’re trying to combine Autonomous Systems

• The problem is that a prefix cannot be installed in the RIB by both Autonomous Systems at the same time, so one will be accepted and one will be rejected; the one that is rejected will not be sent to peers in the losing Autonomous System


Duplicate Router-ID

• A problem previously limited to redistributed routes happens when there are duplicate router-ids

• Please note that this problem is no longer limited to external routes!

• In this example, RtrB sees the route redistributed from RIP on RtrA just fine, but RtrC does not see it

10.1.1.0/24 via RIP

router eigrp 100redistribute ripdefault-metric ....

RtrC#show ip route 10.1.1.0RtrC#

RtrB#show ip route 10.1.1.0....10.1.1.0/24 via [RtrA]

A

C

B

192.168.1.1

Duplicate Router-ID

• Looking at RtrB’s topology table, we can see the originating router ID field in the external route is set to 192.168.1.1

• But, that’s RtrC’s loopback address!

A

C

B

RtrB#show ip eigrp topology 10.1.1.0IP-EIGRP (AS 1) topology entry for 10.10.1.0/24....External data:Originating router is 192.168.1.1

• In the example above, we have a problem where routes are being redistributed into the network, but one router elsewhere in the network is refusing to install the routes into its topology table or routing table

• As the slide shows, the problem is that the router-id of the redistributing router matches the router-id of the router refusing to install the route

• To block routing loops, a router doing redistribution will not accept a route from a neighbor if it is the one that originated it via redistribution; this is known by the originating router field in the external data section of the topology table entry

• Since the router-ids are the same on RtrA and RtrC, RtrC thinks RtrA’s external routes originated on RtrC and it rejects them

Duplicate Router-ID

• Impact on route installation

• EIGRP includes the router ID of the originating router in external routing information in older code, and both internal and external routes in newer code

• If a router receives an route with a router ID matching its own local router ID, it discards the route

• This prevents routing loops/SIA for routes originated locally but also learned form others

• You need to make sure your router-ids are unique by either not duplicating addresses on loopback interfaces or explicitly defining the router-id!

Duplicate Router-ID

• The EIGRP router ID is derived from

• The router-id command

• Highest IP address on a loopback interface

• Highest non-loopback interface IP address if no loopbacks

• NOTE: Interface used for router-id must reside in the same routing table as the EIGRP process creating the router-id

• Once the router ID is set, it won’t be changed without manual intervention, even if the interface from which it’s taken is removed from the router

Duplicate Router-ID

Environments containing mixed versions of IOS can create interesting problems if duplicate router-ids exist in your network

If a prefix is learned through a path that runs code that doesn’t support the internal router-id, the information gets stripped out

This could cause inconsistent results!

Duplicate Router-ID

• The EIGRP router ID was added to internal routes as part of release 5 EIGRP

• Code before release 5 exchanges older TLV types which do not contain the router-id for Internal routes

• If a router running release 5 or later sends updates to an older peer (pre-release 5), it sends the older TLV type without the router-id in the packet

• This can cause inconsistencies in your network

Duplicate Router-ID

Duplicate Router-ID

In the network shown here, RtrAreceives updates for 10.1.1.0/24 through two paths

Because the internal router-id is not supported by RtrC , RtrA would install the prefix learned through him

The prefix learned through RtrBwould be rejected due to the duplicate router-id

This could also mean that the prefix works sometimes and not others!

A

B C

D

Pre-Rel 5

Rel 5

Rel 5

Rel 5 192.168.1.1

192.168.1.1

10.1.1.0/24

• In the preceding slide, a prefix could be learned via two different paths. If the prefix is learned from one peer, it doesn’t contain the router-id and is accepted and installed. If the prefix is learned from the other peer, the router-id is included in the update and the prefix is rejected due to the duplicate router-id.

• This example shows a straightforward result, with one path not allowed stopping the equal cost load-balancing that should be going on based on the topology.

• In some cases, it may happen that at times a prefix could be installed if certain links were up (or down) but not if other links are up (or down.) This inconsistency could be extremely difficult to troubleshoot due to the apparent random nature of the symptoms.

• Just remember, router-ids need to be unique!

Duplicate Router-ID

Duplicate Router-ID

• In older versions of Cisco IOS® software, the only way to find out a router’s router ID was to go to an adjacent router and look for some redistributed route from that router

router# show ip eigrp topology 10.1.1.0 255.255.255.0IP-EIGRP (AS 7): topology entry for 10.1.1.0/24State is Passive, Query origin flag is 1, 1 Successor(s), FD is 2560000256Routing Descriptor Blocks:10.1.2.1 (Ethernet0/0), via 10.1.2.1, Send flag is 0x0

Composite metric is (2560000256/0), Route is External....External data:

Originating router is 192.168.1.1AS number of route is 0External protocol is RIP, external metric is 1Administrator tag is 0 (0x00000000)

• In current versions of Cisco IOS software, the router ID is listed in the output of show ip eigrp topology

• router-1# show ip eigrp topology

• IP-EIGRP Topology Table for AS(7)/ID(192.168.1.1)

• ....

• If your event log is large enough, or things are happening slowly enough, you might also see the problem indicated in your event log

• 1 02:30:18.591 Ignored route, metric: 192.168.1.0 2297856

• 2 02:30:18.591 Ignored route, neighbor info: 10.1.1.0/24 Serial0/3

• 3 02:30:18.591 Ignored route, dup router: 192.168.1.1

Duplicate Router-ID

Duplicate Router-ID

• To determine the EIGRP release to see if it supports the internal router-id, use the command “show eigrp plugin”

• If the command is rejected, you’re definitely running older code

• If the command is accepted, look at the version of EIGRP in the output to see if it’s before or after release 5

RtrA#show eigrp plugin



: 2.02.34 : Source Component Release(Portable EIGRP Release(rel5_1))



…Snip …

• To determine the release of EIGRP you’re running on a router, use the command “show eigrp plugin” or “show eigrp plugin detail”. These commands are explained a little later in this presentation but I thought you could use the info here, as well.

• Older code (Release3 and earlier, IIRC) did not support the “show eigrp plugin” command and will reject it as invalid.

• Newer code will show you the version of EIGRP you’re running in addition to which subsystems/features are loaded into EIGRP.

Duplicate Router-ID


• Problems with Links > 1G

• IPv6 Unique Issues

Advanced Issues

• Shared Media

• Scalability

Resource Depletion

• EIGRP by default will use up to 50% of the link bandwidth for EIGRP packets

• This parameter is manually configurable by using the command:

• ip bandwidth-percent eigrp <AS-number> <nnn>

EIGRP Resource Depletion

• Prior to CSCdi36031 (roughly 10.3), EIGRP had huge problems with bandwidth depletion—EIGRP would use all of the link at the expense of layer 2 keep alives!

• With CSCdi36031, there was a significant change in the way we build and transmit packets at the time that still effects low-speed links; occasionally the TAC still gets cases or questions about the behavior so it’s worth explaining here so you can design accordingly

• The biggest part of the change was to implement packet pacing based on the defined bandwidth of the interface. This packet pacing puts enough gaps between the packets to ensure that we don’t overwhelm the interface with EIGRP packets causing the layer two keepalives to be missed and the interface to drop

• There are some circumstances where the bandwidth on the interface isn’t a good measure of what pacing should be, however


Bandwidth over Multipoint Interfaces

• EIGRP over multipoint interfaces such as DMVPN and mGRE has to share the available bandwidth among peers

• EIGRP uses the bandwidth on the main interface divided by the number of neighbors on that interface to get the bandwidth available per neighbor

• Hint: IWAN Topology

• Some interface types appear to EIGRP to be a shared interface but in reality they’re provided by a point-to-point mechanism (like DMVPN—Dynamic Multipoint Virtual Private Networks—which uses MGRE for transport) and the ability of the underlying network may not match up with the bandwidth defined on the interface; for example, if an mGRE outbound interface is Gigabit Ethernet but the tunnels traverse an ISPs network, we can’t actually send at Gigabit rates and expect all of the packets to be delivered at that rate

• EIGRP divides the defined bandwidth by the number of peers seen, giving each approximately equal shares of the available bandwidth; this may not be exactly right, but it’s the best we can guess


• Set the bandwidth on the multipoint interface to a value that most closely defines the actual circuit speed and ability to deliver traffic to the peers.

• Don’t artificially scale the bandwidth configuration down or up for traffic control!

• Use DMVPN Per-Tunnel QoS feature to control bandwidth along with the eigrp “ip bandwidth-percent” command.


• For DMVPN, other resources can also be depleted (and often are)

• Several processes are involved, each of which has overhead and can run into resource depletion• nhrp, ipsec, etc.

• Make sure interface queues and buffers are tuned to minimize/eliminate drops

• Monitor EIGRP process queues for drops in large scale scenarios with instability • No way to adjust these queues, amount of information or pacing needs to be

adjusted via good summarization (with summary metrics!) or filtering


Show Commands

RtrB#show ip eigrp traffic

EIGRP-IPv4 VR(IWAN) Address-Family Traffic Statistics for AS(1)









PDM Process ID: 627

Socket Queue: 0/10000/2/0 (current/max/highest/drops)



Show Commands

• Show ip eigrp traffic can be very useful to see what kind of activity has been occurring on your network; Some of the most interesting information includes:

• Input queue high water mark—this shows how many packets have been queued inside of the router to be processed—when packets are received from the IP layer, EIGRP accepts the packets and queues them up for processing; if the router is so busy that the queue isn’t getting serviced, the queue could build up—unless there are drops, there is nothing to worry about, but it can give you an indication of how hard EIGRP is working

• SIA-queries sent/received—this is useful to determine how often the router has stayed active for at least one and one-half minutes (as mentioned in the earlier section on stuck-in-active routes; this number should be relatively low—if it’s not, it’s taking a bit of time for replies to be received for queries, and it might be worth exploring why


• DMVPN/mGRE is a very popular technology which presents some interesting challenges in troubleshooting and avoiding problems; from an EIGRP perspective, the mGRE Tunnel interface is multicast and our sending process (and pacing) is based on the assumption that we send a multicast packet out to all of the peers on the Tunnel interface—in reality, the multicast packet is replicated by the NHRP code (Next Hop Resolution Protocol), which then delivers the replicated packets to (normally) ipsec for encryption before delivering on the Tunnel

• Each step along the way has queues and buffers and other resources required to do the job of packet delivery—each of these resources are potential places of resource depletion; the following slides give you a few commands we’ve found useful when troubleshooting large scale DMVPN environments


• Many commands are useful to see how DMVPN/mGRE are behaving resource-wiseShow ip nhrp summary

Show ip nhrp multicast

Show ip nhrp traffic

Show buffers | include failures

Show interface tunnel 1 | include nput

Show interface Gig 0/1 | include nput (for outbound interface used by tunnel)

Show buffer input-interface Gig 0/1 header

Show ip eigrp topo summary

Show ip eigrp traffic

Show ip eigrp interface detail tunnel 1


hub2#show ip nhrp summary

IP NHRP cache 800 entries, 288000 bytes

0 static 800 dynamic 0 incomplete

hub2#show ip nhrp

106.1.0.2/32 via 106.1.0.2

Tunnel2 created 1w6d, expire 00:14:37

Type: dynamic, Flags: unique registered

NBMA address: 4.1.0.2

106.1.0.6/32 via 106.1.0.6

Tunnel2 created 1w6d, expire 00:14:37

Type: dynamic, Flags: unique registered

NBMA address: 4.1.0.6


hub2#show ip nhrp multicast

I/F NBMA address

Tunnel2 2.2.2.2 Flags: static

Tunnel2 4.9.0.142 Flags: dynamic





hub2#show ip nhrp traffic

Tunnel2: Max-send limit:65535Pkts/10Sec, Usage:0%

Sent: Total 3014400

0 Resolution Request 0 Resolution Reply 0 Registration Request

3014400 Registration Reply 0 Purge Request 0 Purge Reply

0 Error Indication 0 Traffic Indication

Rcvd: Total 3014400

0 Resolution Request 0 Resolution Reply 3014400 Registration Request

0 Registration Reply 0 Purge Request 0 Purge Reply

0 Error Indication 0 Traffic Indication


hub2#show buffers | include failures

0 failures (0 no memory)





hub2#show interface tunnel 2 | include nput

Last input 00:00:00, output 00:00:01, output hang never

Input queue: 0/4096/0/0 (size/max/drops/flushes); Total output drops: 0

30 second input rate 146000 bits/sec, 189 packets/sec

198829081 packets input, 1620463316 bytes, 0 no buffer

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort


hub2#show interface gig 0/3 | include nput

output flow-control is XON, input flow-control is unsupported

Last input 00:00:00, output 00:00:01, output hang never

Input queue: 1/4096/0/0 (size/max/drops/flushes); Total output drops: 0

5 minute input rate 150000 bits/sec, 173 packets/sec

199154019 packets input, 128670948 bytes, 0 no buffer

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 664336 multicast, 0 pause input


hub2#show buffer input-interface gig 0/3 header

Buffer information for Middle buffer at 0x7A59A7C

data_area 0x7890CFE4, refcount 1, next 0x0, flags 0x280

linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1

if_input 0x5A9DA04 (GigabitEthernet0/3), if_output 0x0 (None)

inputtime 1w6d (elapsed 00:00:00.004)

outputtime 00:00:00.000 (elapsed never), oqnumber 65535

datagramstart 0x7890D02A, datagramsize 108, maximum size 756

mac_start 0x7890D02A, addr_start 0x7890D02A, info_start 0x0

network_start 0x7890D038, transport_start 0x7890D04C, caller_pc 0x22DCC58

source: 4.19.0.82, destination: 2.2.2.2, id: 0xCA33, ttl: 252, prot: 47


EIGRP Resource Depletionhub2#show ip eigrp topology summary

EIGRP-IPv4 Topology Table Summary for

AS(1)/ID(3.21.66.1)

Head serial 1, next serial 8011

1679 routes, 0 pending replies, 0 dummies

Enabled on 877 interfaces, 800 neighbors present

on 1 interfaces

Quiescent interfaces:

Tu2

hub2#show ip eigrp traffic

EIGRP-IPv4 Traffic Statistics for AS(1)









PDM Process ID: 259

Socket Queue: 0/2000/864/0

(current/max/highest/drops)

Input Queue: 0/2000/864/0

(current/max/highest/drops)

DMVPN with Multiple Next Hops

• In DMVPN phase 2 setup, a hub router can learn two or more equal-cost paths to a site.

• However, the hub router will only advertise one of the paths to other spokes in the DMVPN network.

Implication:

• Spoke to spoke tunnels will only be established to a single router and cannot leverage this multi-routers setup

• Phase 3 of DMVPN mitigates issue 10.1.5.0/24

.5 .6

10.1.5.0 [90/18600] via 172.16.1.5, Tunnel1

via 172.16.1.6, Tunnel1

10.1.5.0 [90/32600] via 172.16.1.5, Tunnel1

• While this isn't a route propagation problem, per se, it's still a situation that may take you by surprise and therefore may be useful to understand

• One of the designs being implemented with DMVPN uses multiple paths from the hub to reach spoke subnets. This could be two paths to the same spoke or through two spokes (as shown on the previous slide)

• The problem is that EIGRP still uses normal distance vector rules and sends updates based on the top topology table entry. Even if there are two equal cost paths, EIGRP sends updates based on the top entry. Since historically distance vector protocols only report how far they are metric-wise from a destination, this has been enough. Now it’s not quite enough information

DMVPN with Multiple Next Hops (Multiple IWAN Hubs)


• One way to avoid this situation is to use a different hub for each preferred spoke path

• Each hub could use either a metric type command (offset-list) or distance (if all internals) to prefer one path or the other in order to propagate the desired next-hop information to the other spokes

10.1.5.0 [90/18600] via 172.16.1.5

10.1.5.0 [90/32600] via 172.16.1.5

via 172.16.1.6

10.1.5.0 [90/18600] via 172.16.1.6

10.1.5.0/24

.5 .6

• There is also new functionality (add-path) allowing a route to be advertised with multiple next hops, addressing this very situation: Multiple DMVPN hubs advertising the same prefix. If your code version supports this functionality, it is the ideal solution and the most elegant.

• Without the add-path functionality though it's not that hard to design your network so that the two paths to the prefix at the spoke is learned through two different hubs rather than through one

• You could then use metric (offset-list possibly) or the distance command to have each hub prefer a different path to the spoke prefix. It could then advertise the next-hop value associated with it's choice, avoiding the problem.


• This might also have some relevance to EIGRP OTP, information later in additional information slides.

• IWAN Architecture addresses this in several different ways:

• DMVPN Phase 3 – no longer requiring the routes to be advertised for spoke-to-spoke tunnel establishment

• PFRv3’s ability to look at the additional routing information for path selection and load sharing

• Ability to carry additional next hops and advertise shared prefixes from the hub site/dual pop’s.


• Scaled Bandwidth Problem

• Delay Granularity Problem

• Path Selection Problem

• The Solution – Wide Metrics

• Remaining Issues

Problems with Links > 1G


• EIGRP was created when FastEthernet was considered fast

• In order to simplify metric calculation and comparison, bandwidth was scaled

• Scaled Bandwidth is 10^7/BW (with BW in kbps)

• This worked fine until the BW was 10^7 or greater

• 10G is 10^7 in kbps, so the scaled value becomes 10^7/10^7 or a value of 1

• If the interface bandwidth is > 10G, the value becomes 0!

• 10^7/11^7 is < 1 therefore value becomes 0 (Integer math)

• While the code is smart enough to discount a BW value of 0 for the calculation, it can still cause invalid routing decisions

Scaled Bandwidth Problem


• EIGRP was created in the early 1990s and was an enhancement to IGRP, which dated back to the 1980s

• In order to scale the metrics larger for EIGRP, the IGRP was increased from 24 to 32 bits in size. This wasn’t enough.

• Similar to IGRP, we scaled the Bandwidth value in order to ease transport and comparisons. By making the stored bandwidth value 10^7/BW, larger bandwidth values creating a smaller number, making comparisons easier. (Bigger bandwidth = smaller value/better metric)

• Of course, when you go beyond 10^7 as a bandwidth value, the formula breaks down. 10G is 10^7 (since it’s measured in kbps). Links above 10G weren’t envisioned in the early 1990s and that lack of foresight cause us to have to deal with it now.

Scaled Bandwidth Problem

Problems with Links > 1GScaled Bandwidth Problem

C6K-MCORE-1#show interface port-channel 12

Port-channel12 is up, line protocol is up (connected)

…

MTU 9216 bytes, BW 20000000 Kbit, DLY 10 usec,

…

C6K-MCORE-1#show ip eigrp topology 10.1.1.0/30

EIGRP-IPv4 Topology Entry for AS(1)/ID(1.0.0.23) for 10.1.1.0/30


Descriptor Blocks:

0.0.0.0 (Port-channel12), from Connected, Send flag is 0x0

Composite metric is (256/0), route is Internal

Vector metric:

Minimum bandwidth is 0 Kbit

Total delay is 10 microseconds

…


• Interface delay values are defined by the core code

• EIGRP just pulls in the value provided by the infrastructure

• Just like EIGRP with scaled BW, the core code didn’t foresee such high BW links

• 1 Gigabit – Delay is 10 microseconds



• Etc.

• Since EIGRP uses aggregate delay to make path selection decision, this is trouble

• A 1G, 10G, and 20G link all look the same value and the 20G isn’t preferred

Delay Granularity Problem


• While this problem wasn’t really EIGRP’s fault, the cause is similar to the Bandwidth problem described above.

• The Interface Delay values were determined back in the 1980s and the range of Delay values in microseconds became too small too quickly.

• Since a 1G link had a delay value of 10 microseconds and the value is specified in 10s of microseconds, you can’t get any smaller delay values on the interface.

• Since EIGRP uses this Delay value in the metric calculations, we were unable to tell a 1G link from a 2G link from a 10G link, etc.

• Not our fault, but our problem to fix!

Delay Granularity Problem


• Due to the problem with Scaled BW and Delay granularity, path selection is negatively impacted

• Links could be seen as equal cost when they’re not

• Higher BW links are not preferred over lower BW links

• While this will not create routing loops, sub-optimal routing could easily occur

Path Selection


• The lack of Delay granularity and the inability of EIGRP to store and compare BW values > 10G combined to cause EIGRP to make invalid or suboptimal routing decisions when links above 1G are used.

• Routing loops will not occur since all routers make the same (wrong) decision, but you can’t use the higher speed links in your network in the way you want. There’s a reason you put in a 10G link (or higher) and we should pay attention to that!

• Unfortunately, the standard formula, interface delay values, and transport wouldn’t allow us to fix the problem using the old metric components.

Path Selection

Problems with Links > 1GPath Selection

A

B C

D

10.1.1.0/24

10G

10G 1G

1G

RtrD sees two equal cost paths to reach 10.1.1.0/24

One path definitely better, but not preferred

– Underutilize 10G path?

– Overrun 1G path?


• In the preceding diagram, EIGRP on RtrD will be unable to distinguish the difference in metric for the path to 10.1.1.0/24 through RtrB and RtrD.

• Since the 10G link will have the same scaled bandwidth value as the 1G link, and both will have a delay value of 10 msecs, both paths are seen as being the same.

• This is either inefficient or terrible. EIGRP on RtrD will attempt to install equal cost paths through RtrB and RtrC, using both equally in the forwarding plane.

• Since the path through RtrC cannot handle nearly as much traffic as the path through RtrD, it’s likely either the path through RtrC will be overwhelmed or the path through RtrB will be underutilized.

Path Selection

Problems with Links > 1GPath Selection

A

B

C

D

10.1.1.0/24

10G

10G

1G

1G

An even worse example:

RtrD sees one best path to reach 10.1.1.0/24 through RtrC

The path taken is definitely worse than the other, but EIGRP is not aware and thus makes the wrong routing decision

10G

E


• In the past, the only solution to this problem was to manually set the delay values so that you could impact the path selection to make sense in your network

• This wasn’t a good answer, mainly because it was administratively burdensome and hard to design/implement

• We also saw cases where customers built in routing loops by changing the delay values differently on different ends of the links!

• OSPF solved this a few years ago by defining a reference bandwidth that must be set the same throughout an area

• EIGRP doesn’t have areas and we couldn’t make customers upgrade all routers in the network in order to support a different metric method

• How can we solve this elegantly?

The Old Solution


• EIGRP has implemented a feature called “Wide Metric” support, which no longer scales the BW and dynamically creates the delay in picoseconds

• This feature is backward compatible (though mixed environments can have suboptimal routing)

• Implemented in EIGRP Release 8.0 and later (Release 18.0 is current!)

• NOTE: only available when configured in Named Mode!

• On by default when configured in Named Mode

http://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/enhanced-interior-gateway-routing-protocol-eigrp/whitepaper_C11-720525.html

The Solution


Router# show eigrp address-family ipv4 topology

EIGRP-IPv4 VR(WideMetric) Topology Entry for AS(4453)/ID(3.3.3.3) for 100.1.0.0/16

State is Passive, Query origin flag is 1, 1 Successor(s), FD is 262144, RIB is 2048

Descriptor Blocks:

2.0.0.2 (Ethernet0/2), from 2.0.0.2, Send flag is 0x0


Vector metric:

Minimum bandwidth is 20000000 Kbit

Total delay is 3000000 picoseconds

Reliability is 255/255

Load is 1/255

Minimum MTU is 1500

Hop count is 2

Originating router is 100.1.1.1

The Solution


• Possible Routing Loop with IOS and IOS/XR PEs

• Granularity Problem for Offset-lists between Classic and Wide Metrics

Remaining Issues

Problems with Links > 1GPossible Routing Loop with IOS and IOS/XR PEs

• PE - Provider Edge router

• CE - Customer Edge router

• VPN - Virtual Private Network

• VRF - Virtual Routing and Forwarding instance (routing table)

• Backdoor link - link between sites which doesn’t use the VPN

Service Provider

EIGRP Site 1

EIGRP Site 2

PE

VPN

PE

CE

CEBackdoor link


• In a MPLS/VPN PE/CE environment with both IOS and IOS/XR routers running wide metrics, IOS/XR evaluates the cost community attribute of the prefixes differently than IOS

• This can cause the IOS and IOS/XR PE to make different decisions about the exit point for a prefix, potentially leading to a routing loop

• The workaround is to use 32 bit metrics on the IOS/XR router

• Test metric-type 32-bit before CSCul96943

• Metric-type 32-bit after CSCul96943

Possible Routing Loop with IOS and IOS/XR PEs


• One PE is running an IOS/XR version with Wide Metric support and another is running IOS with Wide Metric support

• There is also a backdoor link between the two EIGRP sites

• PE1 could prefer PE2 and PE2 could prefer PE1

• This can cause a routing loop!

• Define metric-type 32-bit on IOS/XR!

Possible Routing Loop with IOS and IOS/XR PEs

Service Provider

EIGRP Site 1

EIGRP Site 2

PE1 PE2

CE

CE

IOSIOS/XR

Backdoor link


• Wide metrics and Classic metrics use different scales for delay value

• Classic – 10s of microseconds

• Wide – picoseconds

• Offset-lists use changes in delay to “offset” the metric

• If the change is made on a wide-metric router and then sent to a classic router, granularity is lost and a different value than expected may be received

• Workaround – verify changes to metric after implementing offset-list and adjust as needed!

• Note that this could impact match statements if you’re looking for a specific value!

Granularity Problem for Offset-lists between Classic and Wide Metrics

Problems with Links > 1GGranularity Problem for Offset-lists between Classic and Wide Metrics

R1#show run | sec router eigrp

router eigrp 100

network 10.0.0.0

offset-list 10 out 1000

R1#show run | sec access-list

access-list 10 permit any

R1

R2

Release 12

Release 6

10.1.23.1/24

R2#show ip eigrp topology 10.1.23.0/24

EIGRP-IPv4 Topology Entry for AS(100)/ID(10.1.21.1) for

10.1.23.0/24

State is Passive, Query origin flag is 1, 1 Successor(s),

FD is 307200

Descriptor Blocks:



…

Before:

R2#show ip eigrp topology 10.1.23.0/24

EIGRP-IPv4 Topology Entry for AS(100)/ID(10.1.21.1) for

10.1.23.0/24

State is Passive, Query origin flag is 1, 1 Successor(s),

FD is 307200

Descriptor Blocks:



…

After:

• IPv6 Router-ID

• IPv6 Interfaces

• IPv6 Peer Addresses

• IPv6 Shutdown

IPv6 Unique Issues

• EIGRP IPv6 topology table entries include the router-id of the originating router, just like IPv4

• In previous versions, external routes only

• Now true for Internal routes, as well

• The router-id used by IPv6 is a four-byte number

• Actually uses IPv4 address, just like IPv4!

• Why use an IPv4 address for EIGRP IPv6 router-id?

• Seemed overkill to use 128-bit number for router-id

• Routers may not have a global IPv6 address defined

IPv6 Router-id

• EIGRP IPv6 topology table entries have router-id fields, just like their IPv4 equivalents; this router-id is used to identify the place in the network that the prefix was originated to help eliminate routing/redistribution loops

• When designing the EIGRP IPv6 implementation, we discussed whether to use one of the IPv6 addresses for the router-id or whether to use an IPv4 address as we did for IPv4 EIGRP

• Why use an IPv4 address for the router-id?

• First, the router-id is only used as a label to identify where a prefix originated—an IPv4 address is 32 bits and an IPv6 address is 128 bits; to sacrifice 128 bits for the router-id for every prefix would significantly decrease the number of prefixes we could fit into an Update packet, all for a field that is effectively a label

• Additionally, IPv6 has the interesting characteristic that a router isn’t required to have any globally reachable addresses; since a router could contain only link-local addresses, the usefulness of an IPv6 address as a router-id could be minimal—it may or may not give you any indication of where the originating router exists in the network

IPv6 Router-id

• EIGRP IPv6 will not work without a router-id

• EIGRP uses the same router-id selection process used by IPV4• Highest IPv4 address on a Loopback interface

• If no Loopbacks, highest IPv4 address on non-loopback interface

• If no IPv4 address is available to use, manually set the router-id under the “ipv6 router eigrp x” configuration• “eigrp router-id 1.1.1.1”

• Note that in some older versions, the leading “eigrp” in the command above wasn’t required.

• Be intentional with your configuration and define your router-id’s!

IPv6 Router-id

• IPv6 uses exactly the same router-id selection criteria as IPv4

• If there are one or more loopback interfaces, use the highest IP address on a loopback interface

• If no loopback interfaces, use the highest IP address on whatever interfaces you have

• Interesting limitations exist, however

• The interfaces must exist in the same table (IPv4 version) as the IPv6 instance; in other words, if the interfaces belong to a VRF (Virtual Routing and Forwarding table), then they aren’t eligible to be used as router-ids for IPv6

• Note that once a router-id is selected, it isn’t changed to reflect changes in addresses; in other words, if a “no ip address” is issued for the address used as a router-id, it won’t change the router-id used until the next reload

• If there aren’t any interfaces available with IPv4 addresses, manually specify the address using the “eigrp router-id x.x.x.x” command• Some versions don’t use the leading “eigrp”—we haven’t been terribly consistent with this, but we will be from now

on… really

IPv6 Router-id

• EIGRP does not use a network statement to specify its IPv6 interfaces

• Interfaces may not have a routable address (may have only link-local)

• Need some other way to identify which should run EIGRP

• Two different methods exist, depending on the configuration method

• Classic mode uses the “ipv6 eigrp <AS>” command on each interface

• Named mode defaults to having all interfaces enabled• Will need to “shutdown” under af-interface in order to not include some interfaces

IPv6 Interfaces

IPv6 Interfaces

Classic ModeR2#conf t

Enter configuration commands, one per line. End

with CNTL/Z.

R2(config)#ipv6 router eigrp 1

R2(config-rtr)#interface s4/0

R2(config-if)#ipv6 eigrp 1

R2#sh run interface s4/0

Interface Serial4/0

ipv6 address 1:2::2/64

ipv6 eigrp 1

end

R2#sh run | section ipv6 router

ipv6 router eigrp 1

R2#sh ipv6 eigrp topology

P 1:2::/64, 1 successors, FD is 2169856

via Connected, Serial4/0

Named modeR1# conf t

Enter configuration commands, one per line. End

with CNTL/Z.

R1(config)#router eigrp foo

R1(config-router)#address-family ipv6 unicast auto

1

R1#sh run | section router eigrp

router eigrp foo

!

address-family ipv6 unicast autonomous-system 1

!

topology base

exit-af-topology

exit-address-family

R1#sh eigrp address-family ipv6 topology

P 1:1::/64, 1 successors, FD is 1735175958

via Connected, Serial4/0

• EIGRP sends hellos sourced from the link-local interface address

• Note that IPv6 interfaces are not required to have globally routable addresses

• Many routers will NOT have a global address on an interface that is facing other routers!

• A couple of interesting differences due to this use of link-local addresses

• Address in neighbor tables are only useful if you know which routers are reachable on each interface already

• Since a router can assign the same link-local address to multiple interfaces, you may see multiple peers with the same address if you peer across multiple links!

IPv6 Peer Addresses

• Output of “show ipv6 eigrp neighbors” with neighbor seen through multiple interfaces:

IPv6 Peer Addresses

EIGRP-IPv6 Neighbors for AS(1)


(sec) (ms) Cnt Num

3 Link-local address: Gi1/12 14 00:00:05 8 200 0 20

FE80::216:9CFF:FE6E:2C40


FE80::216:9CFF:FE6E:2C40


FE80::216:9CFF:FE6E:2C40


FE80::216:9CFF:FE6E:2C40

• EIGRP IPv6 sends hellos (and actually all EIGRP packets) with the source address of the link-local address on the interface

• Note for those not familiar: IPv6 has two primary address classes• Link-local - not routable and significant only on the link (broadcast domain) on which they exist

• Global - advertised and used for remote reachability

• Since a peer relationship is limited to the link the peers see each other on, routers use the link-local address for communication

• An interesting aspect of this is that some platforms will assign the same link-local address to multiple interfaces

• This works just great since the address is only meaningful on the local link

• This does cause the above behavior, however—note that all four peers seen on this router have exactly the same address! Not a problem, but probably unexpected

IPv6 Peer Addresses

• When EIGRP for IPV6 was originally coded, the router process was defined to be Shutdown by default

• This was done because of the lack of network statement and the fact that interface commands could start EIGRP before filtering was defined

• This default behavior has confused users and testers so we’ve changed it in the latest code

• Just be warned that the default behavior for shutdown in EIGRP IPv6 is different in different versions!

IPv6 Shutdown

• Many, many years ago (in an IOS far, far away) we made the decision to have EIGRP IPv6 start off shutdown by default; since there are no network statements in EIGRP IPv6 , as soon as an interface was defined to use EIGRP IPv6 , processes would start, peers would form, and routes would be exchanged

• Since all of this could happen before filtering was put into place, we decided it was safer to require the user to explicitly do “no shut” under the ipv6 router statement to kick off the processing

• We’ve since come to regret this decision since it’s so drastically different than IPv4 EIGRP behavior—in recent code, we’ve changed the default to not require an explicit shutdown; be warned of our inconsistency

IPv6 Shutdown

• EIGRP IPv6 does not support VRFs in “classic” configuration mode

• There is a new mode of defining EIGRP you’ll be hearing a lot more about in the future that is being use for all advanced features

• Named mode doesn’t include the AS number on the router line and uses address-family commands to create EIGRP processes

• Also with Named mode configuration, all EIGRP interface commands are performed under the router command rather than on the interfaces themselves

• This change was made because of the way processes started in the old configuration method and we required more flexibility in when information was provided

IPv6 and VRF Support


Router#sh run | sec router eigrp

router eigrp foo

!


!

topology base

exit-af-topology

exit-address-family


!

topology base

exit-af-topology

network 10.0.0.0

exit-address-family

• This slide is a preview of coming attractions in addition to an explanation of how to define EIGRP IPv6 on a VRF. For the last several years, advanced features have been coded to use “named mode” configuration rather than classic

• We ran into limitations trying to put many features in classic mode because of the assumptions made when the router command was issued. We decided then to start a new mode for advanced feature but not take away “classic” EIGRP configuration for the more “normal” or simple implementations

• Making things more complicated when you wanted to do the same things you’ve been doing for years made no sense to us. Using a new mode for more complicated features seemed more reasonable

• We hope you agree!


• EIGRP Over-the-Top

• NX-OS

EIGRP ETC

EIGRP OTP

EIGRP Over-the-Top Peering

• Control Plane peering is accomplished with EIGRP “neighbor” statement

• Full Mesh, Partial Mesh, EIGRP Route-Reflectors

• Data Plane packet delivery is accomplished with LISP encapsulation

• No longer dependent upon the Service Provider – Natively maintain the networks EIGRP internal routing

Extend the EIGRP Network over MPLS or Internet Natively

DATALISP

Service Provider

MPLS VPNEIGRP

AS 4453

EIGRPAS 4453

Hello Hello

router eigrp ROCKS

address-family ipv4 unicast auto 4453

neighbor 192.168.2.2 Serial1/0 remote 100 lisp-encap

...

router eigrp ROCKS

address-family ipv4 unicast auto 4453

neighbor 192.168.1.1 Serial1/0 remote 100 lisp-encap

...

DATADATA CE-1 CE-2

WAN Virtualization using OTP

• EIGRP “end-to-end” solution with:

NO special requirement on Service Provider

NO special requirement on Enterprise

NO routing protocol on CE/PE link

NO need for route redistribution

NO no need for default or static routes

EIGRP Support for WAN Transparency

253

PE / CE

BGPComplexity

Carrier Involvement

Multiple Redistribution

Public & Unsecure

EIGRP OTP

EIGRP Simplicity

Carrier Independence

Zero Redistribution

Private & Secure

Availability- ASR 1000 Series– XE 3.10/IOS 15.3(3)S

- ISR, ISR G2, 7200 Series – IOS 15.4(3)T


• No additional routing protocol to administer

• No routing protocol is needed on CE to PE link

• All user traffic appears and unicast IP data packets

• Limit impact on Service Providers Network

• Customer routes are NOT carried in MPLS VPN backbone

• Customer route flaps do not generate BGP convergence events

• Smaller BGP routing tables, smaller memory foot print, lower CPU usage

• Works with existing PE equipment

• Multivendor PE support

• No upgrade requirements for PE or any MPLS VPN backbone router

Service Provider Benefits


• Single routing protocol solution

• Simple configuration and deployment for both IPv4 and IPv6

• Convergence is not depending on Service Provider

• Only the CE needs to be upgraded

• Routes are carried over the Service Provider’s network, not though it

• No artificial limitation on number of routes being exchanged between sites

• Convergence speed not impacted by BGP timers

• Works with both traditional managed and non-managed internet connections

• Compliments an L3 Any-to-Any architecture (optional hair pinning of traffic)

• Support for multiple MPLS VPN connections

• Support for connections not part of the MPLS VPN (“backdoor” links)

Enterprise Benefits

OTP – How it Works

• CE Routers have ‘private’ and ‘public’ interfaces

• Private interfaces use addresses that are part of the Enterprise network

• Public interfaces use addresses that are part of the Service Providers network

• For OTP neighbors to form, the Public interface must also be included in the EIGRP topology database (covered by the “network” command in IPv4)

• Packets are sourced from/to the public interface address eliminates the need for static routes

• EIGRP packets which are normally sent via multicast (Hello, Update, etc..) are sent unicast via the public interface

• Site-to-site traffic is encapsulated using LISP and sent unicast from/to the public interface address

Properties of CE Routers


• EIGRP creates the LISP0 interface, and starts sending Hello packets to remote site via the Public interface

• Once neighborship is formed, EIGRP sends and receives routes from the peer, installing the routes into the RIB with the nexthop interface LISP0

• Traffic that arrive on the router destined for the remote side, is first sent to LISP0

• LISP encaps the traffic and then sends it to the Public interface

EIGRP, LISP, and RIB – Oh My!

EIGRP

PublicInterface

InsideInterface

DefaultTraffic

Site to Site

Traffic

RIB

RouteUpdates

LISP0

OTP – Data Plane

• Why use LISP to encapsulate the data as it traverses the WAN?

• Its “stateless” tunneling, so it;

• Requires NO tunnels to configure or manage

• Is transparent to the endpoints and to the IP core

• Supports both hair-pin and site-to-site traffic

• Supports both IPv4 and IPv6 traffic

• Provides an overlay solution that enables transparent extension of network across WAN

• IP-based for excellent transport independence

• Service provider picks optimal traffic path for site to site data

• Supports multicast and VLANs to allow for future enhancements

LISP Data Encapsulation

OTP – Data Plane

• Path MTU needs to be considered when deploying OTP

• LISP encapsulation adds 36 bytes (20 IP + 8 UDP + 8 LISP) for IPv4(56 bytes for IPv6)

• This could be significant for small packets (e.g., a VoIP packet)

• LISP handles packet fragmentation

• If the DF bit is set, it will generate an ICMP Destination Unreachable message

• LISP does not handle packet reassembly

• As a consequence, it is required to adjust the MTU to ensure the control plan does not fragment

• Best practice - set the MTU is set to to 1444 (or lower) bytes.

LISP Data Encapsulation Properties


• EIGRP OTP is deployed in one of two ways

• Remote Routers

• Used for configuring a router to peer with one specific neighbor

• Forms a full mesh topology

• Configured with the command

• Route Reflectors

• Used to configure a router as a ‘hub’

• Forms a Hub and Spoke topology

• Configured with the command

Modes of Deployment

remote-neighbors source [interface] unicast-listen lisp-encap

neighbor [ipv4/v6 address] [interface] remote [max-hops] lisp-encap [lisp-id]

• For More on how to Deploy and Operate EIGRP OTP, please see the session focused on this solution – BRKRST-3336.

• Can be deployed as a series of full and partial mesh static remote peers – or – utilizing one or more devices as an EIGRP Route-Reflector.

EIGRP OTP

EIGRP OTP

• MTU internal to the provider/LISP Interface (normal tunneling MTU issues)

• LISP Interface State

• Be sure the provider is properly delivering packets across their network

• Ping from public (WAN) interface to remote public interface

• Ping with full mtu and df: ping 192.168.10.1 size 1500 df-bit

• If utilizing route-reflectors, be especially diligent about the Split-horizon and next-hop-self

• For multiple next hops in dual route-reflector scenario – add path option is needed.

• Network statement has to cover public (WAN) interface – else this can result in recursive route lookup failures for dual home sites

Common ‘Gotchas’

• Show commands

• VPC L3 Peering

EIGRP and NX-OS

EIGRP and NX-OS

• NX-OS has a slightly different CLI.

• Configuration Guides provide the latest CLI for enabling features in EIGRP, but some of the show and troubleshooting commands are different and samples have been included.

Show commands

EIGRP and NX-OS

lassie# sh ip eigrp

IP-EIGRP AS 1 ID 1.1.48.10 VRF default

Process-tag: 1

Instance Number: 1

Status: running

Authentication mode: none

Authentication key-chain: none

Metric weights: K1=1 K2=0 K3=1 K4=0 K5=0

IP proto: 88 Multicast group: 224.0.0.10

Int distance: 90 Ext distance: 170

Max paths: 8

Number of EIGRP interfaces: 1 (0 loopbacks)

Number of EIGRP passive interfaces: 0

Number of EIGRP peers: 3

Graceful-Restart: Enabled

Stub-Routing: Disabled

NSF converge time limit/expiries: 120/0

NSF route-hold time limit/expiries: 240/0

NSF signal time limit/expiries: 20/0

Redistributed max-prefix: Disabled

Show commands

EIGRP and NX-OS

lassie# sh ip ei int

IP-EIGRP interfaces for process 1 VRF default

Xmit Queue Mean Pacing Time Multicast Pending

Interface Peers Un/Reliable SRTT Un/Reliable Flow Timer Routes

Vlan48 3 0/0 483 0/0 3624 0


Holdtime interval is 15 sec






Use multicast

Classic/wide metric peers: 3/0

Show commands

EIGRP and NX-OS

lassie# sh ip ei event ?

cli EIGRP CLI related events

errors Error log of EIGRP

fsm FSM log of EIGRP

msgs Message log of EIGRP

packet Packet log of EIGRP

rib RIB log of EIGRP

statistics State and size of the buffers

Show commands

lassie# sh ip ei traff

IP-EIGRP Traffic Statistics for AS 1 VRF

default






Input queue high water mark 3, 0 drops



Hello Process ID: (no process)

PDM Process ID: (no process)

EIGRP and NX-OS

lassie# deb ip eigrp ?

<CR>

1 EIGRP process tag

A.B.C.D Network to display information about

A.B.C.D/LEN IP prefix <network>/<length>, e.g., 192.168.0.0/16

all Enable all EIGRP debugs

fsm EIGRP Dual Finite State Machine events/actions

graceful-restart EIGRP Graceful-Restart

ha EIGRP HA

mpls IP-EIGRP MPLS debugging

neighbor IP-EIGRP neighbor debugging

notifications IP-EIGRP event notifications

packets EIGRP packets

route-map EIGRP route-map

summary IP-EIGRP summary route processing

transmit EIGRP transmission events

urib IP-EIGRP URIB interaction event debugging

vrf-events IP-EIGRP VRF event debugging

Show commands

EIGRP and NX-OS

lassie# sh ip eigrp internal ?

<CR>

> Redirect it to a file

>> Redirect it to a file in append mode

event-history Event History of EIGRP

library-info Show various event logs of library

mem-stats Show memory allocation statistics

| Pipe command output to filter

lassie# sh ip eigrp internal

IP-EIGRP AS 1 ID 1.1.48.10 VRF default

Process-tag: 1

Instance Number: 1

UUID: 1090519344

Process Linux Pid: 7084

Show commands

EIGRP and NX-OS

1/4

0/0/0

1/1

1/3

Lassie

N5K-5548P

B09

Harmon

N5K-5548P

A05

Pointer

N5K-5548UP

B04

Charlie2

ASR1001

C08

Charlie3

ASR1001

C08

1/11/1

1/21/2

1/1

1/3

1/4

0/0/0

10

mgmt0

Jules

N5K-5548P

B09

VPC L3 Peering

• Be careful! TTL issues can occur and some scenarios are unsupported.

• Refer to: http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

• Specifically: Layer 3 and vPC: Guidelines and Restrictions

http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

• Troubleshooting Methodology

• Troubleshooting the Essentials

• Neighbor Formation

• Route Computation & Propagation

• Advanced Issues


• High Speed Links

• IPv6 Specifics

Summary: Troubleshooting EIGRP Networks

• EIGRP contains many tools and techniques that can be used to keep the EIGRP network running smoothly and efficiently

• What now?

• Armed with Information

• Preventative Steps

• Hopefully this session taught you how to make the best use of these tools and techniques and translate them in a manner which will be relevant to your network design

Summary: Troubleshooting EIGRP Networks

Participate in the “My Favorite Speaker” Contest

• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)

• Send a tweet and include

• Your favorite speaker’s Twitter handle ( @smoore_bits )

• Two hashtags: #CLUS #MyFavoriteSpeaker

• You can submit an entry for more than one of your “favorite” speakers

• Don’t forget to follow @CiscoLive and @CiscoPress

• View the official rules at http://bit.ly/CLUSwin

Promote Your Favorite Speaker and You Could Be a Winner

http://bit.ly/CLUSwin

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.

• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.

Continue Your Education

• Demos in the Cisco Campus

• Walk-in Self-Paced Labs

• Table Topics

• Meet the Engineer 1:1 meetings

• And don’t forget CiscoLive.com!

Recommended Reading for BRKRST-2331

ASIN: 1578701651

ISBN: 0201657732

ISBN 1587051877

Open-EIGRP:draft-savage-eigrp-02

ISBN-13: 978-1587144233

Thank you

troubleshooting eigrp networksd2zmdbbm9feqrf.cloudfront.net/2015/usa/pdf/brkrst-2331.pdf · •the...

Documents