nextgen network cisco fabricpath - uninettsamling:2012... · marius holmsen – [email protected]...
TRANSCRIPT
© 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID
Marius Holmsen – [email protected]
NextGen Network Cisco Fabricpath
© 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential Presentation_ID
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 3
Cisco FabricPath Technology and
Design
3
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 4
Intelligent Layer 2 Domains Evolution
L2
L3
L3
L2
IP Cloud
Core
Aggregation
Access
Virtual Access
vPC
vPC+
STP+
STP Enhancements Bridge Assurance
vPC NIC Teaming
Simplified loop-free trees 2x Multi-pathing
FabricPath 16x ECMP
Low Latency / Lossless MAC Scaling
Shipping Nexus 7k Future Nexus 5500
Shipping Nexus 7k/5k
Shipping Nexus 7k/5k
OTV Inter-pod Connectivity across L3
Failure Boundary Preservation Nexus 7000
FabricPath vPC vPC
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 5
Layer 2 Tree
Spanning Tree Protocol typically used to build this tree
Tree topology implies:
Wasted bandwidth → increased oversubscription
Suboptimal paths
Conservative convergence (timer-based) → failure catastrophic (fails open)
11 Physical Links 5 Logical Links
S1
S2
S3
Branches of trees never interconnect (no loop)
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 6
Because customers request it!
• Some protocols rely on the functionality
• Simple, almost plug and play
• No addressing
• Required for implementing subnets
• Allows easy server provisioning
• Allows virtual machine mobility
Why Layer 2 in the Data Center?
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 7
POD POD POD
L2 benefits limited to a POD
L3
L2
Current Data Center Design
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 8
Just extend STP to the whole network
STP
L3
L2
Possible Solution for End-to-End L2?
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 9
Local STP problems have network-wide impact, troubleshooting is difficult
STP provides limited bandwidth (no load balancing)
STP convergence is disruptive
Tree topologies introduce sub-optimal paths
MAC address tables don’t scale
Flooding impacts the whole network
Typical Limitations of L2
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 10
“FabricPath brings Layer 3 routing benefits to flexible Layer 2
bridged Ethernet networks”
Easy Configuration Plug & Play Provisioning
Flexibility
Multi-pathing (ECMP)
Fast Convergence Highly Scalable
Switching Routing
FabricPath
Cisco FabricPath Goal
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 11
Connect a group of switches using an arbitrary topology
With a simple CLI, aggregate them into a Fabric:
N7K(config)# interface ethernet 1/1
N7K(config-if)# switchport mode fabricpath
No STP inside. An open protocol based on L3 technology provides Fabric-wide intelligence
and ties the elements together.
FabricPath
FabricPath: An Ethernet Fabric Turn the network into a Fabric
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 12
Externally, a Fabric looks like a single switch
Internally, a protocol adds Fabric-wide intelligence and ties the elements together. This protocol provides in a plug-and-play fashion:
Optimal, low latency connectivity any to any
High bandwidth, high resiliency
Open management and troubleshooting
Cisco FabricPath provides additional capabilities in term of scalability and L3 integration
FabricPath FabricPath
What Is a Fabric?
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 13
Single address lookup at the ingress edge identifies the exit port across the fabric
Traffic is then switched using the shortest path available
Reliable L2 and L3 connectivity any to any (L2 as if it was within the same switch, no STP inside)
FabricPath
A B
s3
s8
MA
C
IF
A e1/1
… …
B s8, e1/2
e1/1 e1/2
Optimal, Low Latency Switching Shortest Path Any-to-Any
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 14
High Bandwidth, High Resiliency Equal Cost Multi-Pathing
Multi-pathing (up to 256 links active between any 2 devices)
Traffic is redistributed across remaining links in case of failure, providing fast convergence
A B
s3
s8
FabricPath
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 15
Scalable Conversational Learning
Per-port MAC address table only needs to learn the peers that are reached across the fabric
A virtually unlimited number of hosts can be attached to the fabric
FabricPath
A B
s3
s8
MA
C
IF
A s1,e1/1
… …
B e1/2
MA
C
IF
… …
s5
MA
C
IF
A e1/1
… …
B s8, e1/2
e1/1 e1/2
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 16
Layer 2 integration VPC+
Allows extending VLANs with no limitation (no risks of loop)
Devices can be attached active/active to the fabric using IEEE standard port channels and without resorting to STP
FabricPath
A
s3
s8
s7
B
s4
VLAN X
VLAN Y
VLAN Z
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 17
Edge Device Integration Hosts Can Leverage Multiple L3 Default Gateways
Hosts see a single default gateway
The fabric provide them transparently with multiple simultaneously active default gateways
Allows extending the multipathing from the inside of the fabric to the L3 domain outside the fabric
FabricPath
A
s3
dg dg
L3
dg
FabricPath
17
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 18
IS-IS assigns addresses to all FabricPath switches automatically
Compute shortest, pair-wise paths Support equal-cost paths between any
FabricPath switch pairs
L1
FabricPath Routing Table
L2 L3
L4
FabricPath
Swit
ch
IF
S10 L1
S20 L2
S30 L3
S40 L4
S20
0 L1, L2, L3, L4
… …
S40
0 L1, L2, L3, L4
S100
S200
S300
S400
S10 S20 S30 S40
New Control Plane Plug-n-Play L2 IS-IS Manages Forwarding Topology
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 19
Classical Ethernet (CE)
S10
S20
S30
S40
S100
S200
S300
1/1
S300: CE MAC Address Table
MAC IF
B 1/2
… …
MAC IF
B 1/2
A S100
1/2
S300: FabricPath
Routing Table
Switc
h
IF
… …
S100 L1, L2,
L3, L4
FabricPath (FP)
Switch ID space:
Routing decisions are made based
on the FabricPath
routing table
MAC address space:
Switching based on MAC address tables
The association MAC address/Switch ID is maintained at the edge
Traffic is encapsulated across the Fabric
S100 S300 A
B
A B
New Data Plane
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 20
FabricPath Terminology
CE Edge Ports
FP Core Ports
Spine Switch
Leaf Switch
Interface connected to traditional network device Sends/receives traffic in standard 802.3 Ethernet frame
format Participates in STP domain
Forwarding based on MAC table
Classical Ethernet (CE)
S10
S20
S30
S40
S100
S200
S300
1/1
1/2
FabricPath (FP)
A B
Interface connected to another FabricPath device Sends/receives traffic with FabricPath header
Does not run spanning tree Does not perform MAC learning!
Exchanges topology info through L2 ISIS adjacency Forwarding based on ‘Switch ID Table’
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 21
S100: CE MAC Address Table
A
S10
S20
S30
S40
S100
S200
S300
FabricPath
B
1/1
Classical Ethernet
S300: CE MAC Address Table
MAC IF
B 1/2
… …
S200: CE MAC Address Table
MAC IF
… …
… …
S100 M A
B
Lookup B: Miss Don’t learn
Lookup B: Miss Flood
Lookup B: Hit Learn source A
MAC IF
B 1/2
A S100
MAC IF
… …
… …
MAC IF
A 1/1
… …
1/2
Unknown Unicast
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 22
Classical Ethernet Conversational
Learning
S100: CE MAC Address Table
A
S10
S20
S30
S40
S100
S200
S300
FabricPath
B
1/1
S300: CE MAC Address Table
MAC IF
B 1/2
… …
S200: CE MAC Address Table
MAC IF
… …
… …
MAC IF
B 1/2
A S100
MAC IF
… …
… …
MAC IF
A 1/1
… …
1/2
S300: FabricPath
Routing Table
Switc
h
IF
… …
S100 L1, L2,
L3, L4
S300 S100 B
A Lookup A: Hit Send to S100
Lookup A: Hit Learn source B
MAC IF
A 1/1
B S300
Known Unicast, Conversational Learning
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 23
New Data Plane Means ASIC Support NEXUS 7000 F1 Series I/O Module
SFP+ 1/10G I/O module
Layer 2 forwarding with L3/L4 services (ACL/QoS)
Multi-protocol – Classic Ethernet/VPC, FabricPath, DCB, FCoE
High performance
230Gbps fabric connectivity
32 line-rate ports per slot with local switching
N7K-F132XP-15
† sometimes called “switch-on-chip”
1
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
Front Panel Ports
Fabric ASIC
To Fabric Modules
Fabric ASIC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 X 10G SoC
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
System-on-Chip (SoC)† design
Minimum Software: NX-OS 5.1(1)
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 24
New Data Plane Means ASIC Support NEXUS 7000 F2 Series I/O Module
48-port 1G/10G with SFP/SFP+ transceivers
Layer 2/Layer 3 forwarding with L3/L4 services (ACL/QoS)
Supports Nexus 2000 (FEX) connections
High performance
480G full-duplex fabric connectivity
48 line-rate ports per slot with local switching
† sometimes called “switch-on-chip” System-on-Chip (SoC)† design
4 X 10G SoC
Front Panel Ports
To Fabric Modules
Fabric 2
2 4
LC CPU
EOBC
To Central Arbiters
Arbitration Aggregator …
4 X 10G SoC
6 8
4 X 10G SoC
10
12
4 X 10G SoC
14
16
4 X 10G SoC
18
20
4 X 10G SoC
22
24
4 X 10G SoC
26
28
4 X 10G SoC
30
32
4 X 10G SoC
34
36
4 X 10G SoC
38
40
4 X 10G SoC
42
44
4 X 10G SoC
46
48
1 3 5 7 9 11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
N7K-F248XP-25
Minimum Software: NX-OS 6.0(1)
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 25
New Data Plane Means ASIC Support NEXUS 5500
• FabricPath supported on Nexus 5500 platforms
(N5548P, N5548UP, N5596UP)
N5K-C5548P-FA N5K-C5548UP-FA
32 Fixed Ports 1/10G Ethernet or 1/2/4/8 FC
Line-rate, Non-blocking 10G FCoE/IEEE DCB
1 Expansion Module Slot
IEEE 1588, FabricPath & Layer 3 Capable
Redundant Fans & Power Supplies
N5K-C5596UP-FA
48 Fixed Ports 1/10G Ethernet or
1/2/4/8 FC
Line-rate, Non-blocking 10G
FCoE/IEEE DCB
3 Expansion Module Slot
IEEE 1588, FabricPath & Layer 3
Capable
Redundant Fans & Power Supplies
Minimum Software: NX-OS 5.1(3)N1(1)
25
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 26
The Nexus 7000 features two kinds of I/O Modules: M series and F series.
M series I/O Modules cannot switch FabricPath traffic
When running FabricPath, FP Core and CE Edge ports must be on an F series I/O Module
New FabricPath/CE locally significant VLAN mode:
FabricPath VLANs can only be enabled on F series I/O Modules or FEX host interfaces (FEX attached to F2 parent I/O Module)
FabricPath
S100(config)# vlan 10
S100(config-vlan)# mode ?
ce Classical Ethernet
VLAN mode
fabricpath Fabricpath VLAN
mode
S100(config-vlan)# mode
fabricpath
S100(config-vlan)#
F
F FabricPath Core Port
Classical Ethernet
Edge Port
Only F modules switch FabricPath traffic
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 27
FabricPath Allows dual-homed connections
from edge ports into FabricPath domain with active/active forwarding
Classic Ethernet switches, Layer 3 routers, load-balancers, dual-homed servers, etc.
Only requirement is device can form port-channel interface
Can also provide active/active HSRP
Configuration virtually identical to standard VPC
→ FabricPath → CE
VPC+
S1 S2
Host STP Device
Introducing VPC+
27
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 28
VPC+ peer switches share a “virtual” FabricPath switch ID
MAC addresses behind VPC+ port-channels appear as “connected” to the virtual switch, not the VPC+ peer switches
Allows load-balancing within FabricPath domain toward the VPC+ virtual switch
VPC+ requires F modules with FabricPath enabled in the VDC
Peer-link and all VPC+ connections must be to F ports
F1 F1
VPC+ F1
F1 F1
S1 S2
F1
F1 F1
VPC+ F1
F1 F1
S1 S2
F1
Host A→S4→L1,L2
S3
Host A
Host A
L1 L2
S3
L1 L2
S4
Physical
Logical
Virtual “Switch 4” becomes egress switch for Host A in FabricPath domain
→ FabricPath → CE
VPC+ Details
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 29
A given VDC can be part of VPC domain, or VPC+ domain, but not both
VPC+ only works on F modules with FabricPath enabled in the VDC
Conversion between VPC and VPC+ is disruptive
VPC VPC+
Peer-link M ports or F
ports
F ports
Member ports M ports or F
ports
F ports
VLANs CE FabricPath
VLANs only
Peer-link switchport
mode
CE trunk port FabricPath core
port
VPC vs. VPC+
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 30
Transparent Interconnection of Lots of Links (TRILL)
IETF standard for Layer 2 multipathing
Driven by multiple vendors, including Cisco
TRILL now officially moved from Draft to Proposed Standard in IETF
Proposed Standard status means vendors can confidently begin developing TRILL-compliant software implementations
Cisco FabricPath capable hardware is also TRILL capable
http://datatracker.ietf.org/wg/trill/
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 31
FabricPath TRILL
Frame routing (ECMP, TTL, RPFC
etc…)
Yes Yes
vPC+ Yes No
FHRP
active/active
Yes No
Multiple topologies Yes No
Conversational
learning
Yes No
Inter-switch links Point-to-point
only
Point-to-point OR
shared
FabricPath will provide a TRILL mode with a software upgrade (hardware is already TRILL capable)
Cisco will push FabricPath-specific enhancements to TRILL
FabricPath vs. TRILL Overview
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 32
TRILL Encapsulation
FabricPath Encapsulation
TRILL devices can communicate over a shared Ethernet segment with several peers.
FabricPath has a more compact frame format (simpler HW, lower latency) and can only peer on point-to-point links.
A s1
s2
s3
s2
s1
s3
s1
A B Data s3
s2
s3
s1
A B Data A B Data A B Data
B
s3
s1
A B Data A B Data A B Data s3
s1
A B Data
s4
End-to-End
Hop-by-Hop
End-to-End
FabricPath vs. TRILL: Encapsulation
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 33
Running on the Supervisor Engine:
FabricPath IS-IS – SPF routing protocol process that forms the core of the FabricPath control plane
DRAP – Dynamic Resource Allocation Protocol, extension to FabricPath IS-IS that ensures network-wide unique and consistent Switch IDs and Ftag values
U2RIB – Unicast Layer 2 RIB, containing the “best” unicast Layer 2 routing information
L2FM – Layer 2 forwarding manager, managing the MAC address table
Running on the I/O modules:
U2FIB – Unicast Layer 2 FIB, managing the hardware unicast routing table
MTM – MAC Table Manager, managing the hardware MAC address table
Hardware tables on I/O modules:
Switch table – Contains Switch IDs and next-hop interfaces
MAC table – Contains local and remote MAC addresses
Other HW – Variety of other table memories, hardware registers, etc. required for FabricPath forwarding
Key FabricPath Unicast Processes
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 34
S100
S10
S20
S30
S40
S200
S300
FabricPath
FabricPath Routing Table
Describes shortest (best) paths to each Switch ID based on link metrics
Equal-cost paths supported between FabricPath switches
FabricPath Routing Table
on S100 Switc
h
IF
S10 L1
S20 L2
S30 L3
S40 L4
S200 L1, L2, L3,
L4
… …
S300 L1, L2, L3,
L4
One ‘best’ path to S10 (via L1)
Four equal-cost paths to S200
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 35
Display IS-IS View of Routing Topology show fabricpath isis route
S100# sh fabricpath isis route
Fabricpath IS-IS domain:
default MT-0
Topology 0, Tree 0, Swid
routing table
10, L1
via port-channel10, metric 20
20, L1
via port-channel20, metric 20
30, L1
via port-channel30, metric 20
40, L1
via port-channel40, metric 20
200, L1
via port-channel30, metric 40
via port-channel40, metric 40
via port-channel20, metric 40
via port-channel10, metric 40
300, L1
via port-channel30, metric 40
via port-channel40, metric 40
via port-channel20, metric 40
via port-channel10, metric 40
S100#
FabricPath
A C B
S100
S300 S200
S10 S20 S30 S40
po10
po20
po30
po40
Destination Switch ID
Next-hop interface(s)
Routing metric
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 36
FabricPath
A C B
S100
S300 S200
S10 S20 S30 S40
po10
po20
po30
po40
Display U2RIB View of Routing Topology show fabricpath route
S100# sh fabricpath route
FabricPath Unicast Route Table
'a/b/c' denotes ftag/switch-id/subswitch-id
'[x/y]' denotes [admin distance/metric]
ftag 0 is local ftag
subswitch-id 0 is default subswitch-id
FabricPath Unicast Route Table for Topology-Default
0/100/0, number of next-hops: 0
via ---- , [60/0], 0 day/s 04:43:51, local
1/10/0, number of next-hops: 1
via Po10, [115/20], 0 day/s 02:24:02, isis_fabricpath-
default
1/20/0, number of next-hops: 1
via Po20, [115/20], 0 day/s 04:43:25, isis_fabricpath-
default
1/30/0, number of next-hops: 1
via Po30, [115/20], 0 day/s 04:43:25, isis_fabricpath-
default
1/40/0, number of next-hops: 1
via Po40, [115/20], 0 day/s 04:43:25, isis_fabricpath-
default
1/200/0, number of next-hops: 4
via Po10, [115/40], 0 day/s 02:24:02, isis_fabricpath-
default
via Po20, [115/40], 0 day/s 04:43:06, isis_fabricpath-
default
via Po30, [115/40], 0 day/s 04:43:06, isis_fabricpath-
default
via Po40, [115/40], 0 day/s 04:43:06, isis_fabricpath-
default
1/300/0, number of next-hops: 4
via Po10, [115/40], 0 day/s 02:24:02, isis_fabricpath-
default
via Po20, [115/40], 0 day/s 04:43:25, isis_fabricpath-
default
via Po30, [115/40], 0 day/s 04:43:25, isis_fabricpath-
default
via Po40, [115/40], 0 day/s 04:43:25, isis_fabricpath-
default
S100#
Topology (ftag), Switch ID, Sub-
Switch ID
Administrative distance, routing
metric
Client protocol
Next-hop interface(s)
Route age
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 37
S10
S20
S30
S40
FabricPath Multidestination Trees
Multidestination traffic constrained to loop-free trees touching all FabricPath switches
Root switch elected for each multidestination tree in the FabricPath domain
Network-wide identifier (Ftag) assigned toeach loop-free tree
Support for multiple multidestination trees provides multipathing for multi-destination traffic
Two multidestination trees supported in NX-OS release 5.1
Root for Tree 1
Root for Tree 2
S100 S20
S10
S200
S300
S30
S40
Logical Tree 1 (Ftag 1)
Root
S40
S100
S200
S300
S10
S20
S30
Logical Tree 2 (Ftag 2)
Root
S100
S200
S300
FabricPath
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 38
FabricPath
S10
S20
S30
S40
Root for Tree 1
Root for Tree 2
S100
S200
S300
Multidestination Trees and Role of the Ingress FabricPath Switch
Ingress FabricPath switch determines which tree to use for each flow
Other FabricPath switches forward based on tree selected by ingress switch
Broadcast and unknown unicast typically use first tree
Hash-based tree selection for IP multicast, with several configurable hash options
Multidestination
Trees on Switch 100 Tree IF
1 L1
2 L1,L2,L3,L4
L1 L2 L4 L3
L5 L6 L7 L8
L9 L10 L11 L12
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 39
Root
S10 S20 S30 S40
S100
S200 FabricPat
h
po1 po2 po3 po4
A B
po1 po2
Display IS-IS View of Multidestination Trees show fabricpath isis trees
S100# sh fabricpath isis trees
multidestination 1
Fabricpath IS-IS domain: default
Note: The metric mentioned for
multidestination tree is from the root
of that tree to that switch-id
MT-0
Topology 0, Tree 1, Swid routing table
10, L1
via port-channel1, metric 0
20, L1
via port-channel2, metric 20
30, L1
via port-channel3, metric 20
40, L1
via port-channel4, metric 20
200, L1
via port-channel1, metric 10
S100#
S10# sh fabricpath isis trees
multidestination 1
Fabricpath IS-IS domain: default
Note: The metric mentioned for
multidestination tree is from the root
of that tree to that switch-id
MT-0
Topology 0, Tree 1, Swid routing table
20, L1
via port-channel1, metric 20
30, L1
via port-channel1, metric 20
40, L1
via port-channel1, metric 20
100, L1
via port-channel1, metric 10
200, L1
via port-channel2, metric 10
S10#
Multidestination tree
OIF to reach this switch on
this tree
Metric from root to this
switch
Destination Switch ID
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 40
How Are Multidestination Roots Selected?
All FabricPath switches announce their root priority in Router Capability TLV
FabricPath network elects a single root switch for the first (broadcast) multidestination tree in the topology
Switch with highest priority value becomes root for the tree
Highest system ID, then highest Switch ID value, used in event of a tie
Broadcast root determines roots of additional multicast trees and announces them in Router Capability TLV
Multicast roots spread among available switches to balance load
Selection based on same criteria as above
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 41
FabricPath
A C B
S100
S300 S200
S10 S20 S30 S40
po10
po20
po30
po40
Identify Multidestination Tree Roots show fabricpath isis topology summary
S100# sh fabricpath isis topology summary
Fabricpath IS-IS domain: default FabricPath IS-IS Topology Summary
MT-0
Configured interfaces: port-channel10 port-channel20 port-
channel30 port-channel40
Number of trees: 2
Tree id: 1, ftag: 1, root system: 0026.51cf.ae41, 10
Tree id: 2, ftag: 2, root system: 0024.f71f.5241, 40
S100#
Number of multidestination trees in topology
Tree IDs and Ftags
System IDs and Switch IDs of root switches
Interfaces in the topology Topology
number
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 42
Cisco FabricPath
Frame
Classical Ethernet Frame
FabricPath Encapsulation 16-Byte MAC-in-MAC Header
Switch ID – Unique number identifying each FabricPath switch
Sub-Switch ID – Identifies devices/hosts connected via VPC+
LID – Local ID, identifies the destination or source interface
Ftag (Forwarding tag) – Unique number identifying topology and/or distribution tree
TTL – Decremented at each switch hop to prevent frames looping infinitely
DMAC SMAC 802.1Q Etype CRC Payload
DMAC SMAC 802.1Q Etype Payload CRC (new)
FP Tag (32)
Outer SA (48)
Outer DA (48)
Endnode ID (5:0)
Endnode ID (7:6)
U/L
I/G
RS
VD
OO
O/D
L
Etype 0x8903
6 bits 1 1 2 bits 1 1 12 bits 8 bits 16 bits 10 bits 6 bits 16 bits
Switch ID Sub
Switch ID Ftag TTL LID
Original CE Frame 16 bytes
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 43
Følg oss gjerne på
Blogg mariusholmsen.typepad.com
Facebook gruppe Cisco DC Norge
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 44
Backup slides
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 45
IEEE 802.1BR
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 46
Nexus Fabric Extender (FEX) 802.1BR (VNTAG) Port Extension
Bridges that support Interface Virtualization (IV) ports must support
VNTag and the VIC protocol
NIV uplink ports must connect to an NIV
capable bridge or an NIV Downlink
Hypervisor
NIV downlink ports may be connected to an NIV uplink port,
bridge or NIC
NIV may be cascaded extending
the port extension one additional level
NIV downlink ports are assigned a virtual identifier (VIF) that corresponds to a
virtual interface on the bridge and is used to
forward frames through NIV’s
LIF
VIF NIV capable
adapters may extending the port
extension
HIF
The 802.1BR Architecture provides the ability to extend the
bridge (switch) interface to downstream devices
802.1BR associates the Logical Interface (LIF) to a Virtual
Interface (VIF)
LIF
Note: Not All Designs Supported in the Architecture Are Currently Implemented
46
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 47
Nexus 2000 Fabric Extender (FEX) VN-Tag Port Extension
Logical Interface (LIF) on the
ingress UPC is used to forward
the packet
N2K ASIC maps specific VNTag to
HIF interface
Packet is forwarded over fabric link using
a specific VNTag
Nexus 2000 Fabric Extender operates as a remote line card and does not support local switching
All forwarding is performed on the Nexus 5000/5500 UPC VNTag is a Network Interface Virtualization (NIV)
technology that ‘extends’ the Nexus 5000/5500 port down (Logical Interface = LIF) to the Nexus 2000 VIF referred to
as a Host Interface (HIF)
VNTag is added to the packet between Fabric Extender and Nexus 5000/5500
VNTag is stripped before the packet is sent to hosts
VNTag allows the Fabric Extender to act as a data path of Nexus 5000/5500/7000 for all policy
and forwarding
HIF
LIF
VNTAG Ethertype
source virtual interface
destination virtual interface d p
l
Frame Payload
CRC[4]
VNTAG[6]
SA[6]
DA[6]
802.1Q[4]
47
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 48
vNIC 1 vNIC 2
1A
2B
Nexus 5500 Adapter FEXAssociation of a vNIC to a veth
veth1 veth2
A-FEX is supported on UCS and Nexus 5500
Virtual NIC (vNIC): Refers to a hardware partition of a physical NIC as seen by an Operating System (Virtual Interface = VIF)
Virtual Ethernet interface (veth): Refers to a virtual network port” (vNIC) as seen by the Nexus 5500 (Logical Interface = LIF)
OS
LIF
VIF
48
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 49
Nexus 5000/5500 and 2000 Packet Forwarding Overview
Unified Crossbar
Fabric
Ingress UPC
Egress UPC
Nexus 2000 FEX ASIC
1. Frame received on N2K HIF port
1
2 3 4 5
6
2. Nexus 2000 appends VNTag
and forwards frame to fabric uplink
3. Nexus 5000 UPC performs ingress forwarding and
queuing
4. If required egress queuing and flow control
5. Nexus 5000 UPC appends destination VNTag and forwards frame on fabric link
6. VNTag stripped and
frame forwarded out on N2K HIF
port
Nexus 5000/5500
Nexus 2000 Nexus 2000
Nexus 2000 FEX ASIC
49
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 50
Nexus 5000/5500 and 2000 Virtual Switch Packet Forwarding Latency
Unified Crossbar
Fabric
Ingress UPC
Egress UPC
Nexus 2000 VNTag ASIC
1G Store & Forward
Nexus 5000 Nexus 2000 Nexus 2000
Nexus 2000 VNTag ASIC
Nexus 2000 also supports Cut -Through switching
1GE to 10GE on first N2K ingress is store and forward
All other stages are Cut Through (10GE N2K port operates in end to end cut through)
Port to Port latency is dependent on a single store and forward operation at most
Packet Size (Bytes)
Por
t to
Por
t Lat
ency
(us
ec)
Nexus 5500/2232 Port to Port Latency
Cut Through Switching in all subsequent stages
10G Cut-Thru
0
0.5
1
1.5
2
2.5
3
3.5
4
64 256 512 1024 1518 4096 9216
50
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 51
Nexus 5000/5500 and 2000 Switching Morphology - Is this Really Different?
Unified Crossba
r Fabric
Ingress UPC
Egress UPC
Nexus 2000 FEX ASIC
Nexus 5500 Nexus 2000 Nexus 2000
Nexus 2000 FEX ASIC
Fabric ASIC
DFC
PFC
Port ASIC &
Buffers
Port ASIC &
Buffers
Sup720 67xx - DFC 67xx - CFC
X-Bar Fabric
Distributed Forwarding
ASIC
Line Card Ports,
Buffers, Egress MCAST
replication
Internal Packet Header used across the Fabric (Constellation Header –
VNTag)
Nexus 2000 Architecture localizes the Forwarding ASIC in the parent switch
(supervisor)
Minimal latency due to cut-thru architecture
De-coupled life cycle management (upgrade the supervisor without
worrying about line card)
TCO advantages
Reduced SW/HW complexity
Key Design consideration is over-subscription
51
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 52
Nexus 2000 Port Channels Nexus 2248/2232 Port Channels
Nexus 2200 series FEX support local port channels
All FEX ports are extended ports (Logical Interfaces = LIF)
A local port channel on the N2K is still seen as a single extended port
Extended ports are each mapped to a specific VNTag
HW hashing occurs on the N2K ASIC
Number of ‘local’ port channels on each N2K is based on the local ASIC
21xx – Do not support local port channels (2 port vPC only)
22xx – Support up to 24 local port channels of up to 8 interfaces each as well as vPC (total of 2 x 8 = 16 ports)
1. Packet is received and
lookup forwards out a LIF (N2K)
interface
3. N2K ASIC hashes locally and transmits packet
on one HIF interface
SC VMK VM
2. Packet is forwarded over
fabric link using a specific VNTag for
the destination N2K LIF (port
channel)
52
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 53
FEX Deployment Models
The Fabric Extenders can be deployed using three different models:
Straight-through FEX using static pinning
Straight-through FEX using dynamic pinning
Active-active FEX using vPC
vPC vPC
Straight-through Static Pinning
Straight-through Dynamic Pinning
Active-active
Nexus 5000/5500 Nexus 5000/5500
Nexus 7000/5000/5500
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 54
Static Pinning
Static pinning statically maps server downlink ports on the FEX to the uplink ports that connect the FEX to the Cisco Nexus 5000 switch
Port mapping depends on the number of uplink ports that are used and the number of downlinks on the FEX
Example: A Cisco Nexus 2248TP GE using 4 uplinks pins downlink ports 1-12 to uplink 1, ports 13-24 to uplink 2, ports 25-36 to uplink 3, and ports 37-48 to uplink 4.
When an uplink port fails all downlink ports pinned to that uplink are disabled
Oversubscription ratio is preserved
Single-homed servers lose connectivity
Dual-homed servers fail over to the other NIC
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 55
Configuring Static Pinning
Enable the FEX feature and define the FEX instance number:
Define the number of uplinks used for static pinning:
Change the mode of the ports connecting to the FEX to FEX-Fabric and associate the ports with the FEX:
Note: All FEX configuration is performed on the Cisco Nexus 5000 or 7000 Switch
N5K-1(config)# feature fex
N5K-1(config)# fex 111
N5K-1(config-fex)# description "FEX 111, rack 1, top"
N5K-1(config-fex)# pinning max-links 4
Change in Max-links will cause traffic disruption.
N5K-1(config)# interface ethernet 1/1-4
N5K-1(config-if-range)# switchport mode fex-fabric
N5K-1(config-if-range)# fex associate 111
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 56
Verifying Static Pinning
Once the FEX has been associated with the Cisco Nexus switch, the ports on the FEX can be seen as ports on the switch using the FEX number as a “virtual slot” number
N5K-1# show fex 111
FEX: 111 Description: FEX 111, rack 1, top state: Online
FEX version: 5.0(2)N2(1) [Switch version: 5.0(2)N2(1)]
Extender Model: N2K-C2248TP-1GE, Extender Serial: JAF1420AHPE
Part No: 73-12748-05
pinning-mode: static Max-links: 4
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 - Interface Up. State: Active
Eth1/2 - Interface Up. State: Active
Eth1/3 - Interface Up. State: Active
Eth1/4 - Interface Up. State: Active
N5K-1# show running-config | begin "interface Ethernet111"
interface Ethernet111/1/1
interface Ethernet111/1/2
interface Ethernet111/1/3
<…further output omitted…>
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 57
Verifying Static Pinning (Cont.)
To display detailed FEX status information including the port pinning use the show fex detail command:
N5K-1# show fex detail
FEX: 111 Description: FEX 111, rack 1, top state: Online
FEX version: 5.0(2)N2(1) [Switch version: 5.0(2)N2(1)]
FEX Interim version: 5.0(2)N2(1)
Switch Interim version: 5.0(2)N2(1)
Extender Model: N2K-C2248TP-1GE, Extender Serial: JAF1420AHPE
Part No: 73-12748-05
Card Id: 99, Mac Addr: 54:75:d0:ed:73:42, Num Macs: 64
Module Sw Gen: 12594 [Switch Sw Gen: 21]
post level: complete
pinning-mode: static Max-links: 4
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 - Interface Up. State: Active
Eth1/2 - Interface Up. State: Active
Eth1/3 - Interface Up. State: Active
Eth1/4 - Interface Up. State: Active
Fex Port State Fabric Port
Eth111/1/1 Down Eth1/1
Eth111/1/2 Up Eth1/1
Eth111/1/3 Down Eth1/1
Eth111/1/4 Down Eth1/1
Eth111/1/5 Down Eth1/1
<…further output omitted…>
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 58
Dynamic Pinning
Dynamic pinning uses a port channel between the Cisco Nexus switch and the FEX
Traffic distribution across the uplinks is determined through port-channel hashing
When an uplink fails, traffic is rehashed onto the remaining links
Server downlinks are not disabled when a single uplink fails
Oversubscription ratio between FEX and switch changes on uplink failure
Single-homed servers retain connectivity
Dual-homed servers do not fail over to the other NIC
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 59
Configuring Dynamic Pinning
Enable the FEX feature and define the FEX instance number:
For dynamic pinning, the number of uplinks used is 1, the port channel interface:
Change the mode of the ports connecting to the FEX to FEX-Fabric and associate the ports with the channel group:
Associate the port channel interface with the FEX:
N5K-1(config)# feature fex
N5K-1(config)# fex 121
N5K-1(config-fex)# description "FEX 121, rack 2, top"
N5K-1(config-fex)# pinning max-links 1
Change in Max-links will cause traffic disruption.
N5K-1(config)# interface ethernet 1/9-12
N5K-1(config-if-range)# switchport mode fex-fabric
N5K-1(config-if-range)# channel-group 21
N5K-1(config)# interface port-channel 21
N5K-1(config-if)# fex associate 121
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 60
Active-Active FEX
The active-active FEX allows a single FEX to be dual-homed to two Cisco Nexus 5000 switches
vPC is used to connect the FEX to the pair of Nexus switches
Ports on the FEX are associated with both switches and need to be configured consistently on both switches
Effectively, ports on the FEX are vPCs and the same consistency checks are performed as for regular vPCs
On the Cisco Nexus 5000 switches the configurations can be synchronized automatically using the configuration synchronization feature
Active-active FEX improves the availability of FEX-based solutions
Server traffic is protected against FEX uplink failures and switch failures
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 61
Configuring Active-Active FEX
Enable the FEX feature and define the FEX instance number on both switches:
For active-active FEX, the number of uplinks used is 1, the vPC between the FEX and the switches:
Change the mode of the ports connecting to the FEX to FEX-Fabric and associate the ports with the channel group on both switches:
N5K-1(config)# feature fex
N5K-1(config)# fex 131
N5K-1(config-fex)# description "FEX 131, rack 3, top"
N5K-1(config-fex)# pinning max-links 1
Change in Max-links will cause traffic disruption.
N5K-1(config)# interface ethernet 1/17-20
N5K-1(config-if-range)# switchport mode fex-fabric
N5K-1(config-if-range)# channel-group 31
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 62
Configuring Active-Active FEX (Cont.)
Enable vPC on both switches:
N5K-1(config)# feature vpc
N5K-1(config)# vpc domain 37
N5K-1(config-vpc-domain)# peer-keepalive destination 192.168.1.2
N5K-1(config)# interface ethernet 1/39-40
N5K-1(config-if-range)# channel-group 1
N5K-1(config)# interface port-channel 1
N5K-1(config-if)# switchport mode trunk
N5K-1(config-if)# vpc peer-link
N5K-1(config)# interface port-channel 31
N5K-1(config-if)# vpc 31
N5K-1(config-if)# fex associate 131
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 63
Configuring Active-Active FEX (Cont.)
Now the ports on the FEX can be configured:
Note: Ports should be configured identically on both vPC peer switches.
Effectively, ports on the FEX are seen as vPCs by the switch:
N5K-1(config)# interface ethernet 131/1/1
N5K-1(config-if)# switchport access vlan 10
N5K-2(config)# interface ethernet 131/1/1
N5K-2(config-if)# switchport access vlan 10
N5K-1# show vpc consistency-parameters interface ethernet 131/1/1
Legend:
Type 1 : vPC will be suspended in case of mismatch
Name Type Local Value Peer Value
------------- ---- ---------------------- -----------------------
Speed 1 1000 Mb/s 1000 Mb/s
Duplex 1 full full
Port Mode 1 access access
MTU 1 1500 1500
Shut Lan 1 No No
Allowed VLANs - 10 10
Local suspended VLANs - - -
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 64
FEX Configuration on the Cisco Nexus 7000 Switches
FEX configuration on the Cisco Nexus 7000 Switches is slightly different from the configuration on the Cisco Nexus 5000 Switches
The FEX feature set needs to be installed in the default VDC before the feature set can be used in non-default VDCs:
Use of the FEX feature set can be allowed or disallowed per VDC. Default is allowed.
The Cisco Nexus 7000 Switch only supports dynamic pinning and therefore the FEX fabric interfaces must be members of a port channel
N7K-1(config)# install feature-set fex
N7K-1(config-vdc)# no allow feature-set fex
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 65
Cisco Nexus 7000 FEX Example
The example shows how to configure a FEX in a non-default VDC on a Cisco Nexus 7000 Switch.
In the default VDC configure:
In the non-default VDC configure:
N7K-1-RED(config)# feature-set fex
N7K-1-RED(config)# fex 141
N7K-1-RED(config-fex)# description "FEX 141, rack 4, top”
N7K-1-RED(config)# interface ethernet 1/1-2, ethernet 1/9-10
N7K-1-RED(config-if-range)# switchport
N7K-1-RED(config-if-range)# switchport mode fex-fabric
N7K-1-RED(config-if-range)# channel-group 41
N7K-1-RED(config-if-range)# no shutdown
N7K-1-RED(config)# interface port-channel 41
N7K-1-RED(config-if)# fex associate 141
N7K-1(config)# install feature-set fex
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 66
Understanding Overlay Transport Virtualization
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 67
Enabling the Agile Data Center
L2 Domain Elasticity: vPC, FabricPath
OTV LAN extensions
OTV
VN-link notifications
IP localization: LISP
VM-awareness: VN-link
Port Profiles
Storage Elasticity: FCIP, IO Acceleration
FCoE, Inter-VSAN routing
Device Virtualization: VDCs,
VRF enhancements
OTV
OTV
OTV
Compute resources are part of the cloud, location is transparent to the user
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 68
Distributed Data Centers
Goals when building the data center cloud include the following:
Seamless workload mobility between multiple data centers
Distributed applications closer to end users
Pool and maximize global compute resources
Ensure business continuity with workload mobility and distributed deployments
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 69
Traditional Layer 2 DCI Solutions
Traditional Layer 2 data center interconnects (DCI) are commonly built using:
Ethernet over MPLS (EoMPLS)
Virtual Private LAN Services (VPLS)
Dark fiber
Inherent challenges in these solutions are:
Complex operations: Traditional solutions are complex to deploy and manage
Transport dependent: Traditional solutions require the provisioning of specific transport
Bandwidth management: Inefficient use of bandwidth
Failure containment: Failures from one data center can affect all data centers
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 70
Overlay Transport Virtualization
OTV delivers a virtual Layer 2 transport over any Layer 3 infrastructure
Overlay: A solution that is independent of the infrastructure technology and services, flexible over various interconnect facilities
Transport: Transporting services for Layer 2 Ethernet and IP traffic
Virtualization: Provides virtual stateless multiaccess connections
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 71
Traditional DCI Technologies vs. OTV
Traditional DCI technologies
MAC address learning based on flooding
Failures propagate to every site
Pseudo wires and tunnels
Maintenance of static tunnel configuration limits scalability
Inefficient head-end replication of multicast traffic
Complex dual-homing
Requires additional protocols
STP extension is difficult to manage
OTV
Control plane-based MAC learning
Contains failures by restricting the reach of unknown unicast flooding
Dynamic encapsulation
Optimized multicast replication in the core
Native automated multihoming
Allows load balancing of flows within a single VLAN across the active devices in the same site
STP confined within each site
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 72
OTV Technology Pillars
OTV is a “MAC in IP” technique to extend Layer 2 domains over any transport
The two technology pillars of OTV are:
Dynamic encapsulation
No pseudo-wire maintenance
Optimal multicast replication
Multipoint connectivity
Point-to-cloud model
Protocol learning
Preserve failure boundary
Built-in loop prevention
Automated multihoming
Site independence
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 73
OTV Terminology
OTV uses the following terms:
Edge device
The edge device is responsible for all OTV functionality
The edge device can be at the core or aggregation layer
A given site can have multiple edge devices for redundancy. This is referred to as site multi-homing
Internal interfaces
The internal interfaces are those interfaces on an edge device that face the site and carry at least one of the VLANs that are extended through OTV
Internal interfaces are regular Layer 2 interfaces
No OTV configuration is required on internal interfaces
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 74
OTV Terminology (Cont.)
Join interface
The join interface is one of the uplinks of the edge device
The join interface is a routed point-to-point link
Can be a single routed port, a routed port channel, or a subinterface of a routed port or port channel
The join interface is used to join the overlay network
Overlay interface
The overlay interface is a new virtual interface that contains all the OTV configuration
The overlay interface is a logical multiaccess, multicast capable interface
The overlay interface encapsulates the site Layer 2 frames in IP unicast or multicast packets
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 75
OTV Terminology (Cont.)
The figure below shows the OTV components:
Join interfaces Internal interfaces
Edge device
Edge device
Overlay interfaces
OTV
OTV
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 76
OTV Data Plane Encapsulation
OTV encapsulates Ethernet frames in IP packets
An OTV shim header is added that contains a VLAN ID, overlay number, and CoS value
Original 802.1Q header is removed
OTV adds 42 bytes of encapsulation headers
OTV edge devices do not perform fragmentation
42-byte encapsulation
6B 6B 2B 20B 8B
DMAC SMAC Ether Type IP Header
Original Frame 4B
CRC
VL
AN
OTV Shim
802.1Q
DMAC SMAC Eth Payload 802.1Q
To
S
Co
S
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 77
OTV Data Plane Forwarding
MAC 1 MAC 3
IP A IP B MAC 1 MAC 3
MAC 1 MAC 3
MAC 2 MAC 4
IP A
Transport Infrastructure
OTV
OTV
OTV
IP B OTV
MAC 1 MAC 3
MAC TABLE
VLAN MAC IF
100 MAC 1 Eth 2
100 MAC 2 Eth 1
100 MAC 3 IP B
100 MAC 4 IP B
MAC TABLE
VLAN MAC IF
100 MAC 1 IP A
100 MAC 2 IP A
100 MAC 3 Eth 3
100 MAC 4 Eth 4
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 78
Building the MAC Tables
The OTV control plane proactively advertises MAC address reachability
The MAC addresses are advertised in the background once OTV has been configured
No specific configuration is required
IS-IS is used by OTV as the control protocol between the edge devices
No need to configure or understand the operation of IS-IS
OTV
OTV
OTV
OTV
MAC Address Reachability
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 79
OTV Neighbor Discovery
Before any MAC address can be advertised the OTV edge devices must:
Discover each other
Build a neighbor relationship with each other
The neighbor relationship can be built over a transport infrastructure that is:
Multicast enabled
Uses multicast for neighbor discovery
Unicast only
Uses an adjacency server for neighbor discovery (future)
OTV can leverage any capabilities of the underlying transport infrastructure, including multicast, fast rerouting, and ECMP
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 80
Neighbor Discovery over Multicast Transport
When the transport network is multicast enabled, OTV uses a multicast group for neighbor discovery
Edge devices join the multicast group using IGMP
PIM configuration is not required on edge devices
OTV hellos and updates are encapsulated in the multicast group
Core multicast replication is used to send updates to all OTV neighbors
OTV
OTV
OTV
OTV
Update to 239.1.1.1
Joined to 239.1.1.1
Joined to 239.1.1.1
Joined to 239.1.1.1
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 81
Spanning Tree and OTV
OTV is site transparent for STP:
Each site maintains its own STP topology with its own root bridges even if the Layer 2 domain is extended across the sites
An OTV edge device only sends and receives BPDUs on internal interfaces
This mechanism is built into OTV and requires no additional configuration No BPDUs
OTV
OTV
OTV
OTV
BPDUs
STP Root VLAN 10
BPDUs
STP Root VLAN 10
X
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 82
Unknown Unicast Flooding and ARP Caching
OTV does not forward unknown unicasts across the overlay
Flooding is not required for MAC address learning
Unknown unicast flooding suppression is enabled by default and does not need to be configured
Assumption is that hosts are not unidirectional or silent
Each OTV edge device maintains an ARP cache to reduce ARP traffic on the overlay
Initial ARPs are flooded across the overlay to all edge devices using multicast
When the ARP response comes back, the IP to MAC mapping is snooped and added to the ARP cache
Subsequent ARP requests for the same IP address are answered locally based on the cached entry
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 83
Automated Multihoming
Multihoming sites to an overlay is completely automated
Edge devices within a site discover each other over the OTV site VLAN
The site VLAN is a local VLAN and should not be extended across the overlay
OTV elects one of the edge devices to be the authoritative edge device (AED) for a subset of the extended VLANs
One edge device in the site will be authoritative for the even VLANs, the other for the odd VLANs
The AED is responsible for advertising the MAC addresses and forwarding traffic to and from the overlay for its set of VLANs
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 84
Configuring Basic OTV
OTV configuration consists of several steps:
Configuring the OTV join interface
Configuring the internal interfaces
Enabling OTV and configuring the site VLAN
Configuring the overlay interface
OTV
OTV
Join interfaces Internal interfaces
Edge device
Edge device
Overlay interfaces
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 85
Configuring the Join Interface
The OTV join interface should be a Layer 3 point-to-point interface
The join interface cannot be an SVI or loopback interface
The join interface can be a routed port channel to improve resiliency
IGMPv3 should be enabled on the join interface to enable the use of SSM groups for multicast forwarding
PIM should not be enabled on the join interface. The OTV edge device acts as a multicast endpoint
Any routing protocol including static routing can be used on the transport network
This example shows how to configure the join interface:
N7K-1(config)# interface ethernet1/25
N7K-1(config-if)# ip address 10.1.1.1/24
N7K-1(config-if)# ip igmp version 3
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 86
Configuring Internal Interfaces
The internal interfaces do not contain any OTV specific interfaces
Internal interfaces are typically configured as 802.1Q trunks
Internal interfaces carry at least one of the VLANs that are extended across the overlay
Internal interfaces should carry the OTV site VLAN
Note: It is important that the site VLAN always has an active port on the edge device, because the OTV encapsulation will not work if the site VLAN is down
This example shows how to configure an internal interface:
N7K-1(config)# interface ethernet1/9
N7K-1(config-if)# switchport
N7K-1(config-if)# switchport mode trunk
N7K-1(config-if)# switchport trunk allowed vlan 100-200
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 87
Enabling OTV and the Site VLAN
OTV requires the Transport Services License
The site VLAN is a local VLAN and should not be extended across the overlay
The default site VLAN is VLAN 1
The site VLAN is used by edge devices to discover each other and should be configured on all edge devices in a site
This example shows how to enable OTV and configure the site VLAN:
N7K-1(config)# feature otv
N7K-1(config)# otv site-vlan 200
N7K-1(config-site-vlan)#
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 88
Configuring the Overlay Interface
On the overlay interface the following parameters need to be configured:
The join interface
There can only be one join interface per overlay
The control group
A single PIM ASM or bidirectional group for OTV control traffic
Data groups
A range of PIM SSM groups that is used to carry multicast traffic between the sites
The extended VLANs
The range of VLANs that are to be extended across the overlay
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 89
Configuring the Overlay Interface (Cont.)
The following example shows how to configure the overlay interface:
Note: Do not include the site VLAN in the list of extended VLANs
N7K-1(config)# interface overlay 1
N7K-1(config-if-overlay)# otv join-interface Ethernet1/25
N7K-1(config-if-overlay)# otv control-group 239.1.1.1
N7K-1(config-if-overlay)# otv data-group 232.1.1.0/24
N7K-1(config-if-overlay)# otv extend-vlan 100-199
N7K-1(config-if-overlay)# no shutdown
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 90
FHRP and VLAN Extension
Extended VLANs commonly use a first-hop redundancy protocol (FHRP), such as HSRP or VRRP
Only one router is active for an FHRP group
All traffic is forwarded to the virtual IP and MAC of the active router
Result: Suboptimal routing
OTV
OTV
OTV
OTV
HSRP Active HSRP Standby
HSRP Listen HSRP Listen
VLAN 10
VLAN 11
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 91
FHRP Filtering
To solve the FHRP suboptimal routing problem the FHRP traffic must be filtered from the overlay
One active FHRP router per site
A VLAN access list filters the FHRP control packets
OTV MAC route filter stops the announcement of the FHRP virtual MAC address
OTV
OTV
OTV
OTV
HSRP Active
HSRP Active
HSRP Standby
HSRP Standby
VLAN 10
VLAN 11
Filter HSRP X
OTV
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 92
HSRP Filtering Example
This example shows the configuration of a VLAN access list that filters HSRP traffic:
N7K-1(config)# ip access-list HSRP
N7K-1(config-acl)# permit udp any 224.0.0.2/32 eq 1985
N7K-1(config)# ip access-list ANY-IP
N7K-1(config-acl)# permit ip any any
N7K-1(config)# vlan access-map FILTER-HSRP 10
N7K-1(config-access-map)# match ip address HSRP
N7K-1(config-access-map)# action drop
N7K-1(config)# vlan access-map FILTER-HSRP 20
N7K-1(config-access-map)# match ip address ANY-IP
N7K-1(config-access-map)# action forward
N7K-1-pod5(config)# vlan filter FILTER-HSRP vlan-list 100-199
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 93
HSRP Filtering Example (Cont.)
This example shows the configuration of a OTV route filter that filters the announcement of the HSRP virtual MAC address:
N7K-1(config)# mac-list NOT-HSRP-VMAC deny 0000.0c07.ac00 0000.0000.00ff
N7K-1(config)# mac-list NOT-HSRP-VMAC permit 0000.0000.0000 ffff.ffff.ffff
N7K-1(config)# route-map NO-HSRP-ANNOUNCE permit 10
N7K-1(config-route-map)# match mac-list NOT-HSRP-VMAC
N7K-1(config)# otv-isis default
N7K-1(config-router)# vpn Overlay1
N7K-1(config-router-vrf)# redistribute filter route-map NO-HSRP-ANNOUNCE
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 94
Data Center Bridging Enhancements
A collection of IEEE-based enhancements to classical Ethernet that provide end-to-end QoS
Does not disrupt existing infrastructure
Ethernet enhancements:
Priority groups: Virtualizes links and allocates resources per traffic classes
Priority flow control by traffic class
End-to-end congestion management and notification
Shortest-path bridging: Layer 2 multi-pathing
Benefits of Ethernet enhancements:
Eliminates transient and persistent congestion
Lossless fabric: No drop storage links
Deterministic latency for HPC clusters
Enables a converged Ethernet fabric for reduced cost and complexity
Intel is developing products for Ethernet convergence in virtualized
data centers and driving IEEE standards.
© 2009 Cisco Systems, Inc. All rights reserved. Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Highly Confidential Presentation_ID 95