troubleshooting cisco catalyst 3650 and...
TRANSCRIPT
Troubleshooting Cisco Catalyst 3650 and 3850
Series SwitchesNaoshad Mehta
Principal Engineer, Enterprise Campus Switching Group
Twitter: @naoshad, #CLUS, #convergedaccess
BRKCRS-3146
Troubleshooting Cisco Catalyst 3650 and 3850 Series Switches
Session Overview and Objectives
Cisco is bringing together the best of wired and wireless networking into
“One Network” with Converged Access on the Catalyst 3850 and 3650
Switches
In this session, learn about the capabilities of the 3850 and 3650 switches and
troubleshoot common issues seen on the 3850 and 3650 running the IOS-XE
Operating System. Learn about the switch architecture and troubleshooting
hardware, RTU Licensing, Boot-up Sequence, Memory and CPU utilization,
Stacking, High Availability, Forwarding features on the UADP ASIC, and QoS.
Your Instructor today …Naoshad Mehta
Principal Engineer, Enterprise Campus Switching Group
I’m a Principal Engineer with the Enterprise Campus Switching Software team at Cisco.
My current focus is the adoption of Catalyst 3850/3650 and Converged Access Architecture
in the marketplace. I’ve been with Cisco for 13+ years. My primary responsibility since 2010
was the delivery of the Catalyst 3850/3650 and CT5760 Wireless Controller. I have been
intimately involved with the design and implementation of almost every software aspect of
the 3850/3650 and I’m here to help you learn more about the architecture and how to
troubleshoot the 3850/3650.
Prior to working on the 3850/3650, I have worked on a wide spectrum of technologies
(MPLS, Traffic Engineering, L2VPN, EVCs, etc.), Products (Nexus 7K, 7600, 7500, 7200)
and Operating Systems (Classic IOS, NXOS and IOS-XE).
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Glossary
A SActive Switch Standby Switch
3850/3650 Switch
FED – Forwarding Engine Driver WCM – Wireless Controller Module
PDS – Packet Delivery Service UADP – Unified Access Data Plane ASIC
Reference slide that may not be presented in the session
3x50 – 3650 or 3850 Switch
Suggested Sessions and Reference Material
• BRKCRS-2889 - Converged Access System Architecture - Diving into the 'One Network’
• BRKCRS-2888 – Advanced Enterprise Campus Design: Converged Access
• BRKARC-3438 - Cisco Catalyst 3850 and 3650 Series Switching Architecture
• Cisco Unified Access Technology Overview: Converged Access, http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps12686/white_paper_c11-726107.html
• Cisco Enterprise Campus Infrastructure Best Practices Guide,http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6800-series-switches/guide-c07-733457.html
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Catalyst 3K Switching Portfolio – Before NGWC
C3560G
C3750G
C3750E
C3750X
Sasquatch ASICIOS package A
IOS package B
Strider ASIC
Limited Modularity and Flexibility, No Aggregation SKU
IOS package C
NGWC Switching Portfolio
Same Hardware Architecture and UADP ASIC
Same Software Bundle for all switches
Modular uplinks, 10G Aggregation SKU
Catalyst 3650 Catalyst 3850
1G Copper, 1G Fiber, mGig, 10G Fiber SKUsWireless
SDN
VXLAN
IoT Protocols
Catalyst 3850 Switch
Bui l t on C isco ’s Innova t i ve “UADP” ASIC
Wireless CAPWAP Termination
Integrated Controller:Up to 50 APs and 40G per switch
Up to 2000 Clients per Stack
40 Gbps Uplink Bandwidth
Stackpower
Line Rate on All Ports
SGT/SGACL
480 Gbps Stacking Bandwidth
Full POE+
FRU Fans, Power Supplies
Granular QoS/Flexible NetFlow
Catalyst 3650 Switch
Wireless CAPWAP TerminationUp to 1000 Clients
per Stack
40 Gbps Uplink Bandwidth
Line Rate on All Ports
FRU Fans
Granular QoS/Flexible NetFlow
Modular 160 Gbps 9 member Stack
SGT/SGACL
Full POE+
Fixed 1G/10G Uplinks
Up to 25 Aps/1000 clients per stack, and 40G per switch
New Front-End Power Supplies
The foundat ion fo r fu l l wi red and wi re less convergence on a s ing le p la t fo rm.
IOS
IOS 12.2(52)SE
IOS XE EvolutionIOS XE 3.3.5(SE)
Management Interface
Module Drivers
Common Infrastructure / HA
I O S -X E
• Modern IOS to enable multi-core CPU
• Easy customer migration
• While maintaining IOS functionality and look and feel
• Allow hosted applications like Wireshark
Management Interface
Module Drivers
Linux Kernel
Common Infrastructure / HA
IOSd
FeaturesComponents
Hosted Apps
Features Components WCM
Kernel
3.3.x Features
• 9 member stack
• QoS Revamp
• Wireshark
• HSRP
• UPOE
Internal IPC
Avail
ab
ilit
y F
ram
ew
ork
Packet Delivery Service
Service
Location
Forwarding &
Feature Mgr (FFM)
System
Manager
Platform
Manager
Consolidated
Logging
Comet
Services
Licensing
Services
Interface
Manager
Libraries/
Utilities
Services
External
Transports
(TCP/SCTP/UDP)
Wireless Controller HA
Stack Manager (3K)
IOS XE Software Internals Overview
Kernel
IOS
dR
P/L
C
Features PD
Platform
Drivers
Low Level APIs
UADP ASIC
Drivers
Forwarding Engine Driver
Recommended Release IOS-XE 3.3.5
• First Release IOS-XE 3.2.0(SE) (Jan 2013)
• No further rebuilds after 3.2.3(SE)
• IOS-XE 3.3.0 supports 3650
• Many critical fixes in recommended release 3.3.5(SE) (Sep 2014)
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
System LEDs OverviewFront Panel LEDs
System LEDs – Definitions
• SYST LED
Off = System off
Green = System operating normally
Blinking green = Running POST
Amber = System is malfunctioning
Blinking amber = Network module, power supply, or fan module is malfunctioning
• XPS LED
Off = No XPS cable installed or switch is in StackPower mode
Green = XPS connected and ready to provide backup power
Blinking green = XPS is connected but cannot provide backup power
Amber = XPS is in standby or a fault condition
Blinking amber = Power supply in the switch has failed and is being backed up by XPS
• ACTV LED
Off = Switch is not the active switch
Green = Switch is the active switch or is in standalone mode
Blinking green = Switch is in standby mode
Amber = An error has occurred in the data stack, possibly related to active member selection
• S-PWR LED
Off = StackPower cable not connected or switch is in standalone mode
Green = Switch is connected to an XPS or to 2 StackPower neighbors in a ring configuration
Blinking green = Switch is connected to only 1 StackPower neighbor in a ring configuration
Amber = Fault detected
Blinking amber = StackPower configuration is overbudget
Front Panel LED Description
System LEDs – Definitions (cont.)• STAT LED
Off = Rather than indicating link status, the port LEDs are indicating duplex, speed, stack, or PoEstatus
Green = Port LEDs are indicating link status
• DUPLX LED
Off = Rather than indicating duplex status, the port LEDs are indicating link, speed, stack, or PoEstatus
Green = Port LEDs are indicating duplex status
• SPEED LED
Off = Rather than indicating speed status, the port LEDs are indicating link, duplex, stack, or PoEstatus
Green = Port LEDs are indicating speed status
• STACK LED
Off = Rather than indicating stack status, the port LEDs are indicating link, duplex, speed, or PoEstatus
Green = Port LEDs are indicating stack status
• PoE LED
Off = Rather than indicating PoE status, the port LEDs are indicating link, duplex, speed, or stack status; None of the downlink ports have been denied power or are in a fault condition
Green = Port LEDs are indicating PoE status and none of the downlink ports have been denied power or are in a fault condition
Blinking amber = Port LEDs are indicating PoEstatus and at least one of the downlink ports has been denied power or is in a fault condition
• CONSOLE LED
Off = USB console is inactive
Green = USB console is active (RJ45 console is inactive)
Front Panel LED Description
• CONSOLE SERIAL LED
Off = RJ45 console is inactive (USB console is active)
Green = RJ45 console is active (USB console is inactive)
• MGMT LED
Off = Link down
Green = Link is up with no activity
Blinking green = Link is up with activity
Back Panel LED Description
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Image Naming Convention
cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
cat3k = Platform Family
C = Converged
A = Access
A = Access Switch
Feature Set
Enabling/Disabling of
features controlled by
installed license
S = Digitally signed Image
P = Production Image
A = Key Version
IOS XE Version IOSd Version
Booting IOS-XE Software
Install Boot (default mode)
• Packages are installed on flash
• Supports AP image pre-download
• No additional memory requirement
• Image must be installed in flash:
• software expand
• software install
• boot flash:packages.conf
Bundle Boot
• Packages are expanded in RAM
• No support for AP image pre-
download
• Additional memory equal to the size
of image bundle required
• Image can be booted from flash:,
usbflash: or tftp:
• boot flash:cat3k_caa-
universalk9.SPA.03.03.03.SE.1
50-1.EZ3.bin
• Password recovery on 3850/3650 do NOT follow the 3750 family procedure
• 3850 password recovery is as follows:
1. Power cycle switch and hold the Mode button (on the front top left) for a few seconds (officially 12) until the status LED gets amber, that will get you in Boot Loader prompt (Switch:)
2. Initialize flash
3. Set the following variables
4. Boot the 3850
3850/3650 Password recovery
Switch: SWITCH_IGNORE_STARTUP_CFG=1
Switch: SWITCH_DISABLE_PASSWORD_RECOVERY=0
Switch:flash_init
Switch:
Switch: boot
Warning!
Console:9600 baud
8 data bits, No flow control
1 stop bit, No parity
5. Skip the initial configuration dialog and go to enable (no password required):
6. Copy startup-config back to running-config:
3850/3650 Password recovery – (Cont’d…)
--- System Configuration Dialog ---
Would you like to enter the initial configuration dialog? [yes/no]: no
Press RETURN to get started!
Switch> enable
Switch#
Switch# copy startup-config running-config
7. Go to global configuration, and remove or change the password:
8. Enable reading of startup-config
9. Disable password recovery if required
10. End the configuration and save the change
3850/3650 Password recovery - End
Switch# configure terminal
Switch(config)# no enable password
Switch(config)# no enable secret
Switch(config)# enable secret cisco
Switch(config)# no system ignore startupconfig switch all
Switch(config)# system disable password recovery switch all
Switch(config)# end
Switch# write (copy running-config startup-config)
Software Upgrade on 3x50
Software upgrade in Installed Mode is done via the “software install …” command
Prerequisites for software installation:
The switch’s free memory must be greater than the size of the bundle being installed
The free space in flash: must be greater than the size of the bundle being installed
All switches must be running in installed mode
When installing a bundle from a local storage device, the device must exist on all switches performing
the installation operation
The packages in the bundle to be installed must have valid digital signatures
A failed installation might require a rollback using “software rollback” command or
a manual clean using “software clean” command.
Upgrade/Install a Bundle on flash
Switch# software install file flash:cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
Preparing install operation ...
[2]: Copying software from active switch 2 to switch 1
[2]: Finished copying software to switch 1
[1 2]: Starting install operation
[1 2]: Expanding bundle flash:cat3k_caa-…
[1 2]: Copying package files
[1 2]: Package files copied
[1 2]: Finished expanding bundle flash:cat3k_caa-…
[1 2]: Verifying and copying expanded package files to flash:
[1 2]: Verified and copied expanded package files to flash:
[1 2]: Starting compatibility checks
[1 2]: Finished compatibility checks
[1 2]: Starting application pre-installation processing
[1 2]: Finished application pre-installation processing
[1]: Old files list:
Removed cat3k_caa-base.SPA.03.03.03.SE.pkg
Removed cat3k_caa-drivers.SPA.03.03.03.SE.pkg
Removed cat3k_caa-infra.SPA.03.03.03.SE.pkg
Removed cat3k_caa-iosd-universalk9.SPA.150-1.EZ3.pkg
Removed cat3k_caa-platform.SPA.03.03.03.SE.pkg
Removed cat3k_caa-wcm.SPA.03.03.03.SE.pkg
Preparation stage
Installing to Flash
Post Install Checks
Removing old files
Software RollbackUse the ‘software rollback’ command to revert to the previously installed package set (packages.conf.00-).
Switch# software rollbackPreparing rollback operation ...[2]: Starting rollback operation[2]: Starting compatibility checks[2]: Finished compatibility checks[2]: Starting application pre-installation processing[2]: Finished application pre-installation processing[2]: Old files list:
Removed cat3k_caa-base.SPA.03.03.05.SE.pkgRemoved cat3k_caa-drivers.SPA.03.03.05.SE.pkgRemoved cat3k_caa-infra.SPA.03.03.05.SE.pkgRemoved cat3k_caa-iosd-universalk9.SPA.150-1.EZ5.pkgRemoved cat3k_caa-platform.SPA.03.03.05.SE.pkgRemoved cat3k_caa-wcm.SPA.03.03.05.SE.pkg
[2]: New files list:Added cat3k_caa-base.SPA.03.03.03.SE.pkgAdded cat3k_caa-drivers.SPA.03.03.03.SE.pkgAdded cat3k_caa-infra.SPA.03.03.03.SE.pkgAdded cat3k_caa-iosd-universalk9.SSA.150-1.EZ3.pkgAdded cat3k_caa-platform.SPA.03.03.03.SE.pkgAdded cat3k_caa-wcm.SPA.03.03.03.SE.pkg
[2]: Creating pending provisioning file[2]: Finished rolling back software changes. New software will load on reboot.
[2]: Do you want to proceed with reload? [yes/no]: n
Switch#
Removed newly installed image
Reverted to older image
Recover a Corrupted Install
Copy the image bundle to USB flash and bootup using the following command from
the Bootloader prompt:
boot usbflash0:cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
Copy the image bundle to USB flash and recover the switch by using the recovery
mechanism built into the switch from the Bootloader prompt:
emergency-install usbflash0:cat3k_caa-
universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
Bundle boot image from USB, “software clean file flash”, copy usb
image bin to flash, “software expand file flash:<image.bin>”
Right To Use (RTU) / Honor Based Licensing
Trust Based Licensing Model
Built in licenses, not tied to Unique Device Identifier
Three license levels – lanbase, ipbase and ipservices
Activated using CLI by accepting the End User License Agreement
Portable across devices
No Need to access cisco.com License Portal
License Mismatch
license right-to-use deactivate ipservices
license right-to-use activate ipbase acceptEULA
Reload switch
IP Base
IP Base
IP BaseIP
Services
A
S
Licensing Show commandsSwitch# show license right-to-use slot 1
Slot# License name Type Count Period left
----------------------------------------------------------
1 ipbase permanent N/A Lifetime
1 lanbase permanent N/A Lifetime
1 apcount adder 4 Lifetime
License Level on Reboot: ipservices
Switch# show license right-to-use mismatch
Slot# License Name Adder AP Count Base AP Count
---------------------------------------------------------------
3 ipservices 0 0
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
CPU Complex
Cavium 6230
800 MHz, 4 core CPU
2MB L2 Cache
UADP 1
UADP 2
USB/RJ-45 Console10/100/1000 RJ-45
Ethernet Mgmt
SGMII UART
PCIe
PCIe
4GB DDR3
w/ ECC
DDR3 - 1333
FPGA for
Stack Power
I2C
RTC
ACT II
FPGA for PHY,
LED, etc.
I2C
2GB Flash64MB
Bootloader
Boot Bus
Frequently Asked Questions
Why should I be concerned about high memory utilization ?It is very important have enough free memory to support features and network convergence events that require
transient memory.
What are the usual symptoms of high memory usage ? Memory utilization of process(es) keeps increasing
System runs out of buffers and software packet forwarding stops
Memory allocation failures are reported
System crashes after reporting out of memory
At what percentage level should I start troubleshooting ?It depends on the nature and level of feature config on the switch. It is very essential to find a baseline memory
usage during normal working conditions, and start troubleshooting when it goes above specific threshold.
E.g., Baseline memory usage 40%. Start troubleshooting when the memory goes above 70% and constantly keeps
increasing without adding any new configuration.
Switch1# show processes memory sorted
System memory : 3930916K total, 1118032K used, 2812884K free, 221968K kernel reserved
Lowest(b) : 2252987972
PID Text Data Stack Heap RSS Total Process
10623 56892 36452 92 5400 196116 336728 iosd
5534 8716 311168 92 4620 136908 562460 fed
10619 21976 555372 88 13980 102320 723240 wcm
6032 4 97708 116 91996 99044 116676 idope.py
12259 4 193244 236 38244 73672 299464 wnweb_paster.py
5536 660 163524 88 4332 55968 336496 stack-mgr
6057 3532 137308 88 2200 54200 311676 ffm
6076 112 160908 88 6764 44728 233548 cli_agent
6058 1232 287972 88 8112 38352 438040 eicored
Memory show commands
Total MemoryIOS-XE Process
Memory show commands
Switch1# show processes memory detailed process iosd sorted
Processor Pool Total: 536870912 Used: 135242980 Free: 401627932
IOS Proce Pool Total: 16777216 Used: 9483360 Free: 7293856
PID TTY Allocated Freed Holding Getbufs Retbufs Process
0 0 168268072 31876024 126376204 0 0 *Init*
164 0 1534944 0 1558112 907264 0 NGWC DOT1X Proce
0 0 0 0 984492 0 0 *MallocLite*
1 0 657344 1544 678968 0 0 Chunk Manager
276 0 925564 297800 563696 0 0 os_info_p provid
39 0 415892 1856 376480 0 0 IPC Seat RX Cont
250 0 298204 464 320908 0 0 IPC LC Message H
IOS TasksIncreasing?
Common Causes for high memory utilization
Common Cause Recommended Solution
Extensive Config Reduce configuration to supported scale
Excessive memory allocated to trace buffers1 Reset trace buffers to default sizes
DoS Attack/Punted traffic causing buffer
depletion
Identify packets and block them using an
ACL
Protocol flaps/re-convergence causing high
transient memory utilization
Identify reason for network instability
Memory Leak caused by software bug Open a Service Request
1. set trace control <> buffer default
Command Summary - Memory
Troubleshooting Steps Commands
Check memory usage on system show processes memory sorted
Check memory usage of a particular process show processes memory detailed process fed
Check memory usage of IOSd show processes memory detailed process iosd
Check allocators of memory within IOSd show memory detailed process iosd allocating-
process totals
Frequently Asked Questions
Why should I be concerned about high CPU utilization ?It is very important to protect the control plane for network stability, as resources (CPU, Memory and buffer) are
shared by control plane and data plane traffic (sent to CPU for further processing)
What are the usual symptoms of high CPU usage ?
Control plane instability e.g., OSPF flap
Reduced switching / forwarding performance
Slow response to Telnet / SSH
SNMP poll miss
At what percentage level should I start troubleshooting ?It depends on the nature and level of the traffic. It is very essential to find a baseline CPU usage during normal
working conditions, and start troubleshooting when it goes above a specific threshold.
E.g., Baseline CPU usage 25%. Start troubleshooting when the CPU usage is consistently at 50% or above.
Switch# show proc cpu sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5533 120300 1608989 74 0.29 0.40 0.42 1088 fed
5535 44890 1401868 32 0.24 0.11 0.10 0 stack-mgr
10582 416280 5787047 71 34.25 0.57 0.62 34816 iosd
6201 111520 119850 930 0.15 0.15 0.15 0 cpumemd
5534 38430 3608873 10 0.10 0.10 0.10 0 platform_mgr
10578 115030 4737397 24 0.10 0.12 0.11 0 wcm
5455 1500 40856 36 0.05 0.05 0.05 1088 slproc
6183 5270 211347 24 0.05 0.02 0.04 0 obfld
6185 4320 110250 39 0.05 0.01 0.03 0 console_relay
6198 20900 186795 111 0.05 0.02 0.00 0 ffm
1 1700 1112 1528 0.00 0.09 1.43 0 init
2 0 138 0 0.00 0.00 0.00 0 kthreadd
3 10 1634 6 0.00 0.00 0.00 0 migration/0
4 0 3 0 0.00 0.00 0.00 0 sirq-high/0
Troubleshooting High CPU
4 Core CPU
Platform Processes
IOS-XE Processes
137% across 4
cores
Identify the Culprit
Troubleshooting High CPU
Switch# show processes cpu detailed process iosd sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
10582 L 451160 6379641 70 34.25 0.71 0.60 34816 iosd
10582 L 0 10582 414060 6194757 0 24.00 0.60 0.50 34816 iosd
10582 L 3 11543 36980 180107 0 10.25 0.11 0.10 0 iosd.fastpath
10582 L 2 11544 120 4777 0 0.00 0.00 0.00 34816 iosd.aux
6 I 57680 5216 0 3.00 0.33 0.22 0 Check heaps
304 I 2200 1790 0 12.17 0.00 0.00 0 HTTP CORE
218 I 2370 14495 0 8.33 0.00 0.00 0 IP Input
211 I 190 214 0 0.33 0.00 0.00 0 RSMP Server
306 I 10 23 0 0.11 0.00 0.00 0 SEP NODE PROC
5 I 0 2 0 0.00 0.00 0.00 0 IPC ISSU Dispatch P
7 I 220 336 0 0.00 0.00 0.00 0 Pool Manager
3 I 0 1 0 0.00 0.00 0.00 0 HA-IDB-SYNC
High CPU caused
by HTTP traffic
Interrupt switched
traffic (eg. Wireless
Control)
Drill Down Deeper
Command Summary - High CPU
Troubleshooting Steps Commands
Check CPU usage on IOS threads show process cpu detailed process iosd
[sorted]
Check CPU usage on platform dependent and
Nova threads
show process cpu detailed process {fed |
platform_mgr | stack-mgr | ha_mgr | eicored…}
Check traffic on the RX and TX CPU queues show platform punt client, show platform punt tx
Check details of CPU queues show platform punt statistics port-asic 0 cpuq 0
direction {rx | tx}
CPU Punt Path Architecture
IOSd WCMPunt Shim
Forwarding Engine Driver
Packet Handler
UADP ASIC
32 RX PDS Queues 8 TX PDS Queues
32 RX Queues 8 TX Queues
Processes
Control Packets
Processes
Wireless Control
Packets
Interfaces with
UADP ASIC and
Packet Delivery
Service (PDS)
Common Cause for Punting Traffic to CPU
Common Cause Recommended Solution
Same interface forwarding change design, use “no ip redirect”
ACL logging disable ACL logging
ACL deny causing switch to send ICMP unreachable no ip unreachables1
Forwarding/Feature exception (out of TCAM/adj
space)
reduce TCAM usage
SW-supported feature disable the feature or reduce the amount of
traffic
IP packets with TTL<2 or options disable the offending traffic
Broadcast Storm Fix STP loop, disable traffic
Unexpected control/data traffic Control Plane Policing (CoPP), Deny ACL
Software Bug Open a Service Request
1. Should be configured on all the L3 interfaces of the switch.
Decoding CPU Queues
Switch# show platform punt client
tag buffer jumbo fallback packets received failures
alloc free bytes conv buf
65536 0/1024/1600 0/0 0/512 64845 64845 3371071 0 0
65544 0/ 96/1600 0/4 0/0 0 0 0 0 0
65545 0/ 96/1600 0/8 0/32 1947 1947 612588 0 0
65546 0/ 512/1600 0/32 0/512 13563 137795 24587306 0 0
65548 0/ 512/1600 0/32 0/256 10903 10903 650232 0 0
65551 0/ 512/1600 0/0 0/256 56 56 12088 0 0
65561 411/ 512/1600 0/0 0/128 557245 556834 39010862 0 0
65562 0/ 512/1600 0/16 0/256 0 0 0 0 0
CPU Queue Number
25 (65561-65536)Number of packets in
queue awaiting
processing
Size of Queue Size of each buffer
Displaying packets in the queue
show buffers detailed process iosd assigned packet | beg ng3k_rx25
Buffer information for ng3k_rx25 buffer at 0x35E98E8C
data_area 0x35E9932C, refcount 1, next 0x0, flags 0x80
linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1
if_input Vlan10, if_output 0x0 (None)
source: 10.32.111.83, destination: 10.33.21.219, id: 0x4BE0, ttl: 63,
TOS: 0 prot: 6, source port 51378, destination port 22
35E99382: 6400F124 F1C11410 9FE43A49 08004500 d.q$qA...d:I..E.
35E99392: 00984BE0 40003F06 56110A20 6F530A21 ..K`@.?.V.. oS.!
35E993A2: 15DBC8B2 0016588A DB9F6C34 421A5018 .[H2..X.[.l4B.P.
35E993B2: FFFF8666 000072A2 E1AB5431 78970F84 ...f..r"a+T1x...
Switch# show proc cpu sorted
Core 0: CPU utilization for five seconds: 99%; one minute: 64%; five minutes: 69%
Core 1: CPU utilization for five seconds: 99%; one minute: 89%; five minutes: 80%
Core 2: CPU utilization for five seconds: 12%; one minute: 57%; five minutes: 69%
Core 3: CPU utilization for five seconds: 98%; one minute: 99%; five minutes: 91%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5700 2311985 24103536 2114 49.58 49.70 49.72 0 stack-mgr
5698 1475012 42309915 522 25.80 25.74 25.76 1088 fed
12472 1779005 16386647 90 1.49 1.58 1.65 0 iosd
6239 3163525 50452155 150 0.30 0.31 0.31 0 ffm
43 3496392 43374714 17 0.10 0.10 0.10 0 sirq-net-rx/3
29 70700 12468288 0 0.05 0.01 0.03 0 sirq-timer/2
5699 1747090 31690173 20 0.05 0.10 0.11 0 platform_mgr
Troubleshooting High CPU in stack-mgr – Known issue
Several cores
experiencing
high CPU
Fed and Stack
Mgr the top
consumers
Continued
• High CPU in stack_mgr process observed for prolonged time with no functional impact
• Stack Mgr - RCA
• top/htop output in kernel and show process cpu report different values. Kernel counter gets rolled over and once they roll-over their values do not change – Cosmetic issue in display
• FED process - RCA
• Frequent mac flaps, mac learning events
• Frequent STP Topology Change Notifications
Related Defects
DDTS Description Fixed Release
CSCuo98789ARP broadcast for vlan which is not SVI punted to
CPU incase of Layer 203.3(04)SE
CSCuh47950Routing Protocol packets cause unknown protocol
drops in L2 only vlan03.6(00)E 03.3(04)SE
CSCup05630Changing Aging timer does not change timer on
Active/Local switch03.6(01)E 03.3(04)SE
CSCup24497 Serviceability and enhancement for OOB 03.6(01)E 03.3(04)SE
CSCup15995SifExceptionInterruptA8 need to handle all
conditions besides balloting03.3(04)SE
CSCup39058show process cpu different from top/htop in linux
kernel03.3(04)SE
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
3850 StackWise-480 Overview
• 3850 StackWise-480 is a new generation of Catalyst 3850 stacking
240Gbps of bandwidth (120Gbps TX & 120Gbps RX per connector)
Similar to previous stacking implementations, ring redundancy is achieved via ring-wrap capabilities provided in hardware
NOT backward compatible with currently fielded stacking technologies, most notably StackWise Plus
StackWise-480
3850 StackWise-480 Cables
• StackWise-480 currently supports 3 cables
STACK-T1-50CM = 0.5m cable
STACK-T1-1M = 1m cable
STACK-T1-3M = 3m cable
• All StackWise-480 cables include ACT II chips for counterfeit protection
Stack cables
3850 StackWise-4& 80 Overview
• 3650 StackWise-160 is a new generation of Catalyst 3650 stacking
160Gpbs stacking bandwidth
NOT backward compatible with currently fielded stacking technologies, most notably StackWise Plus
Stack cable can NOT be used on 3850
Stack cables are 50cm, 1m, and 3m in length
StackWise-160 & cables
• 6 rings in total
• 3 rings go East
• 3 rings go West
• Each ring is 40G
• Total Stack BW = 240G
• With Spatial Reuse = 480G
Stack Interfaceof UADP
Stack Interface of UADPASIC
Assuming4 x 24-port3850 Switches
Packets are segmented/reassembled in HW (256 byte
segments)
Understanding the Stack Ring
Is math really an
opinion?
Destination StrippingPacket travels ½ the rings.Taken out of stack by destination
13
13
Assuming4 x 24-port3850 Switches
42
42
Spatial Reuse
Show commands
Switch# show switch detail
Switch/Stack Mac Address : 6400.f124.df80 - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------
*1 Active 6400.f124.df80 10 0 Ready
2 Standby 6400.f124.de80 1 0 Ready
Stack Port Status Neighbors
Switch# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1 OK OK 2 2
2 OK OK 1 1
Priority, followed by MAC
Address determines which
switch gets elected as Active
Show commands
Switch# show switch stack-ports summary
Sw#/Port# Port Status Neighbor Cable Length Link OK Link Active Sync OK #Changes to LinkOK In Loopback
---------------------------------------------------------------------------------------------------------------
1/1 OK 2 50cm Yes Yes Yes 0 No
1/2 OK 2 Unknown Yes Yes Yes 0 No
2/1 OK 1 100cm Yes Yes Yes 1 No
2/2 OK 1 50cm Yes Yes Yes 1 No
Cable with corrupted
EEPROM
Image Version Mismatch
• If the switches are in version mismatch state, they will not stack
• Debugging:
• Check if the switch version matches the active using show version command
• If they do not match, upgrade the switch to the Active’s version
Switch# show switch
Switch# Role Mac Address Priority Version State
---------------------------------------------------------------------------
*1 Active 6400.f125.1480 1 V01 Ready
2 Member 6400.f125.2680 1 0 V-Mismatch
3 Member 6400.f125.2500 1 0 V-Mismatch
4 Member 6400.f125.2480 1 0 V-Mismatch
Switch Stuck in Syncing State• If a switch is stuck in syncing state Get ng3k-ses-oir “trace buffer” using Switch# show
mgmt-infra trace messages Diag OnDemand test for the Stack cable diagnostic start switch 1 test 7
• Debugging:
• Run the command “sh switch” to see the states
• Open a Service Request with Cisco TAC
Switch# show switch
Switch# Role Mac Address Priority Version State
---------------------------------------------------------------------------
*1 Active 6400.f125.1480 1 V01 Ready
2 Standby 6400.f125.2680 1 V01 Ready
3 Member 6400.f125.2500 1 V01 Ready
4 Member 6400.f125.2480 1 0 Syncing
Catalyst 3750-X – StackWise-Plus- Hybrid control-plane processing
- N:1 stateless control-plane redundancy
- Distributed L2/L3 Forwarding Redundancy
- Stateless L3 protocol Redundancy
Catalyst 3850 – StackWise-480- Centralized control-plane processing
- 1+1 Stateful redundancy (SSO)
- Distributed L2/L3 Forwarding Redundancy
- IOS HA Framework alignment for L3 protocol
HA Redundancy – Shift from 3750-X
HA SSO ArchitectureInterfaces
L2
L3
QoS
Interfaces
L2
L3
QoS
Wireless
Wireless
Feature State is synced between Active and Standby Member in stack
Feature States are inactiveon Standby Member
S
A
Route Processor Domain – a set of SW processes (e.g. IOSd, WCM) that
implement the centralized Active and Standby portions of the stack control plane
Line Card Domain – a set of SW processes (e.g. FED, Platform Manager) that
implement the distributed Line Card portions of the stack control plane
Infra Domain – Support SW for the RP and LC Domains
Active Switch – supports the Active RP Domain, a LC Domain and Infra Domain
Standby Switch – supports the Standby RP Domain, a LC Domain and Infra
Domain
Member Switch – supports a LC Domain and Infra Domain.
Election – assigning roles or functions within the stack
HA– Roles and Definitions
RP InfraLC
RP Infra
InfraLC
InfraLC
SLC
• Active starts RP Domain
(IOSd, WCM, etc) locally
• Programs hardware on all LC Domains
• Traffic resumes once hardware is
programmed
• Starts 2min Timer to elect Standby in parallel
• Active elects Standby
• Standby starts RP Domain locally
• Starts Bulk Sync with Active RP
• Standby reaches “Standby Hot”
2min timer
A
Catalyst 3x50 – HA State Machine
Switch# show switch
Switch/Stack Mac Address : 2037.06cf.0e80
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------
*1 Active 2037.06cf.0e80 10 PP Ready
2 Standby 2037.06cf.3380 8 PP Ready
3 Member 2037.06cf.1400 6 PP Ready
4 Member 2037.06cf.3000 4 PP Ready
* Indicates which member is providing the “stack Identity” (aka “stack MAC)
Active
Standby
Show switch with SSO
Show redundancy statesSwitch# show redundancy states show redundancy history show redundancy
switchover history show redundancy
my state = 13 –ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit ID = 2
Redundancy Mode (Operational) = SSO
Redundancy Mode (Configured) = SSO
Redundancy State = SSO
Manual Swact = enabled
Communications = Up
client count = 76
client_notification_TMR = 360000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 0
keep_alive threshold = 9
RF debug mask = 0
Terminal state for SSO. If “peer
state” is stuck in any other state
for more than 10 minutes, open
a service request with TAC
If Communication channel is not
Up, there might be a problem
with stack connectivity. Check
stack cable.
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Configuration and Show commands
No new configuration added for Unicast Features on 3x50
Configuration, show commands compatible with 3750X
Additional Platform CLIs have been added
Refer to configuration guide and command line reference for full details
TCAM Concept on 3x50 TCAM used by several features
A Hash Table Manager (HTM) provides configurable resources to features
so they can select specific banks or hashing mechanisms
HTM provides the required abstraction layer to its users so that the details of
the TCAM HW is irrelevant
TCAM UtilizationSwitch1# show platform tcam utilization asic all
CAM Utilization for ASIC# 0
Table Max Values Used Values
--------------------------------------------------------------------------
Unicast MAC addresses 32768/512 82/22
Directly or indirectly connected routes 32768/8192 7/89
IGMP and Multicast groups 8192/512 0/16
Security Access Control Entries 3072 173
QoS Access Control Entries 2816 52
Netflow ACEs 1024 15
Input Microflow policer ACEs 256 7
Output Microflow policer ACEs 256 7
Control Plane Entries 512 187
Policy Based Routing ACEs 1024 9
Tunnels 256 12
Input Security Associations 256 4
Output Security Associations and Policies 256 9
TCAM - ACL client region
The ACL’s total # VMR should not exceed 1375 VMR entries for the client
region. Once the VMR limit is reached there will be ACL UNLOAD EVENT and all of
that client’s packets will be dropped.
3750: When ACL limit is reached, packets are punted to the CPU and software
forwarded
3850: When ACL limit is reached, ACL is not downloaded and packets are dropped.
No Software forwarding capability
000435: Sep 25 13:00:02: %ACL_ERRMSG-4-UNLOADED: 1 fed: Input IPv4 Group ACL on
interface Client MAC 1822.34be.c1ca for label 10 on asic1 could not be programmed in hardware
and traffic will be dropped.
TCAM programming example – (interface Gi1/0/1)
We first obtain the iif-id of Gi1/0/1:3850# sh platform port-asic ifm mappings local-port switch 1
Mappings Table
LPN ASIC Port Interface IIF-ID Active
1 1 21 Gi1/0/1 0x010290000000007f Y
3850# sh platform acl iifid 0x010290000000007f########################################################
## LE INFO: (LETYPE: Group)
########################################################
LE: 17 (Client MAC 20bb.c021.a540) (ASIC255)
------------
---
LE Type: Group
IIF ID: 0x107a840000003b3
Input IPv4 ACL: label 4 h/w 4 (read from h/w 4)
BO 0x164000000 [CGACL]: xACSACLx-IP-PERMIT_ALL_TRAFFIC-51ef7db1
Output IPv4 ACL: label 0 h/w 0 (Group LE and label are not linked)
Input IPv6 ACL: label 0 h/w 0 (Group LE and label are not linked)
Output IPv6 ACL: label 0 h/w 0 (Group LE and label are not linked)
Input MAC ACL: label 0 h/w 0 (Group LE and label are not linked)
Output MAC ACL: label 0 h/w 0 (Group LE and label are not linked)
---
IPv4 ACL: xACSACLx-IP-PERMIT_ALL_TRAFFIC-51ef7db1
aclinfo: 0x5fc9d0a0
ASIC255 Input Group labels: 4 5
iif-id
Logical Port
number
Input group label
= 4
Client MAC
address
A sample dynamic
ACL downloaded
Commands to check TCAM UtilizationHow to check IPV4 FIB/Route TCAM
3850-1# show platform ip route summary
IP Fib Summary
Total number of v4 fib entries = 36
Total number succeeded in hardware = 36
Mask-Len 0 :- Total-count 1 hw-installed count 1
Mask-Len 4 :- Total-count 1 hw-installed count 1
Mask-Len 8 :- Total-count 7 hw-installed count 7
Mask-Len 24 :- Total-count 3 hw-installed count 3
Mask-Len 32 :- Total-count 24 hw-installed count 24
3850-1# show platform ip route
IP Fib entries
vrf dest htm flags
--- ---- --- -----
0 0.0.0.0/32 0x131ceec0 0x3
0 43.255.255.255/32 0x1311b10c 0x3
0 43.43.43.1/32 0x1311b21c 0x3
0 43.43.43.2/32 0x13124ba4 0x3
0 43.0.0.0/8 0x1311b084 0x3
Number of routes
having mask length
32
Check the HTM index for the
corresponding ipv4 prefix. In this
case for 43.43.43.2/32 summary
prefix.
3850-1# show platform abstraction print-resource-handle 0x13124ba4 1
Handle:0x13124ba4 Res-Type:ASIC_RSC_HASH_TCAM Res-Switch-Num:0 Asic-Num:255 Feature-ID:AL_FID_L3_UNICAST_IPV4 Lkp-ftr-
id:LKP_FEAT_IPV4_L3_UNICAST ref_count:1
Hardware Indices/Handles:priv_ri/priv_si Handle:(nil) handle0:0x5d46e77c handle1:0x5d46e3fc
Detailed Resource Information (ASIC# 0)
----------------------------------------
Number of HTM Entries: 1
Entry #0: (handle 0x5d46e77c)
KEY - vrf:0 mtr:0 prefix:43.43.43.2 rcp_redirect_index:0x0
MASK - vrf:4095 mtr:0 prefix:255.0.0.0 rcp_redirect_index:0x0
FWD-AD = afd_label_flag:0 icmp_redir_enable:1 priority:3 afdLabelOrDestClientId:0 SI:89 destined_to_us:0 hw_stats_idx:2 stats_id:0
redirectSetRouterMac:0
3850# sh platform port-asic rm 0 stationindex 89 switch 1
al_rsc_si
station_index = 0x75
rewriteIndex = 0x1
destIndex = 0x513c
stationTableGeneric Label = 0x0
Check to make sure
it is the correct
prefix 43.43.43.2
Take a note of the
station index which
has the information
for packet forwarding
Destined to us = 0
means it is not
destined to the
switch
Tells you how
Mac will be
rewritten
Tells you where
packet will be
forwarded
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
QoS – What’s New with Converged Access
• Modular QoS based CLI (MQC)
• Alignment with 4500E series (Sup6, Sup7)
• Class-based Queuing, Policing, Shaping, Marking
• More Queues
• Up to 2P6Q3T queuing capabilities
• Standard 3750X provides 1P3Q3T
• Not limited to 2 queue-sets
• Flexible MQC Provisioning abstracts queuing hardware
Granular QoS control at the wireless edge
Tunnel termination allows customers to provide QoS treatment per SSIDs, per-Clients and common treatment of wired and wireless traffic throughout the network
Enhanced Bandwidth Management
Approximate Fair Drop (AFD) Bandwidth Management ensures fairness at Client, SSID and Radio levels for NRT traffic
Wireless Specific Interface Control
Policing capabilities Per-SSID, Per-Client upstream and downstream
AAA support for dynamic Client based QoS and Security policies
Per SSID Bandwidth Management
Wired Wireless
Policy-map PER-PORT-POLICING
Class VOIP
set dscp ef
police 128000 conform-action transmit
exceed-action drop
Class VIDEO
set dscp CS4
police 384000 conform-action transmit
exceed-action drop
Class SIGNALING
set dscp cs3
police 32000 conform-action transmit
exceed-action drop
Class TRANSACTIONAL-DATA
set dscp af21
Class class-default
set dscp default
Platform QoS CLI
Switch# show platform qos policy target GigabitEthernet 1/0/48
Input policy :
--------------
Not attached
Output policy :
--------------
POLICY: defportangn Num Classes:1 PLC Map Targets:0 Queue LBL Targets uplink:0 downlink:0
PMAP:0x6345d778 NextPMAP:0x585d5518 PrevPMAP:0x57b02b98
UP Mask: 0, Lookup Type:0
COS Mask: 0, dscp mask:0
Filter flags: 0, Action Flags:0x14, num_classmaps 1 policy_type: MARKING/POLICING
nfl_req_pending_cnt:0 pmap_qsize:0
CLASS: class-default
CMAP:0x124b42a0 Next:(nil) Prev:(nil)
Masks:- UP:0, CoS: 0, Dscp:0
Filter flags 0
Not Supported
Negate: NO Next:(nil) . . . .
Switch# sh platform qos policy hw_state target GigabitEthernet 1/0/48
Input policy : Not attached
Output policy : defportangn
H/W programming State: INSTALLED IN HW
MLS QoS and MQC QoS Default behaviors
3750, With “mls qos” enabled at global level all the ports are untrusted and DSCP/precedence/COS of the
incoming packets are reset to 0
3750, “mls qos trust” is needed at the interface level to change the trust mode
3850, port is trusted by default, DSCP/precedence/COS values are retained
3750 MLS QoS vs. 3850 MQC QoS
3750 3850
Basic Structure MLS MQC
Global ConfigSupport mls qos
Support some of MQC at ingress
No mls qos support
Support MQC [class-map, policy-map]
Interface ConfigSupport mls qos config and some of MQC cli at
ingressAttach the policy to the interface
Port Ingress Classification/Policing/Marking/Queuing Classification/Policing/Marking
Port Egress Queing Classification/Policing/Marking/Queuing
SVI Ingress Classification/Policing/Marking Classification/Marking
SVI Egress None Classification/Marking
https://techzone.cisco.com/t5/NGWC-Switches-3650-3850/3750-MLS-to-3850-MQC-
Conversion-of-QoS-Configuration/ta-p/697153
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Additional Troubleshooting Commands
Topic Command
Platform specific Feature information show tech-support platform <feature> (eg.
wireless, acl, fnf, etc.)
Trace Buffers for non-IOSd processes show mgmt-infra trace messages <component>
(eg. fed-punject-detail, stack-mgr-events, etc.)
Generate Live Core of a Process (internal
command)
resource process dump <process id obtained from
show process> [ switch <switch number> ]
Generate System Report (internal command) resource create_system_report
Identify memory leaks show memory debug leaks detailed process
<process name> summary
Core Dumps and System Reports• System generates a fullcore, crashinfo and System Report when a process terminates
abnormally
• A System Report is generated each time the switch is rebooted
• System Report contains a dump of all the trace buffers in the system
• When filing a TAC case, please attach the fullcore, crashinfo and System Report files (whatever is applicable) from the crashinfo: filesystem
Agenda• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Summary
• Provided a High Level Architectural overview of features on the 3x50
• Basic Troubleshooting functionality available on the 3x50
• Do you have a better understanding of:
• 3x50 as a platform
• Key differences between 3x50 and 3750X
• Basic troubleshooting on the 3x50
• Would you like to see:
• More/Less of any particular topic
• More topics
• Longer session
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.
• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you