slide title 70 pt capitals slide subtitle minimum 30 pt ethernet routing for large scale distributed...

26
Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis Saltsidis, Jeff Tantsura Ericsson

Upload: brynn-tow

Post on 31-Mar-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale DistributedData Center Fabrics

Dave Allan, János Farkas, Panagiotis Saltsidis, Jeff TantsuraEricsson

Page 2: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 2

› This is a concept and architecture for a distributed Cloud› One purpose is to illustrate the capabilities and the

scalability of the “state of the art” Ethernet› The components of the proposed architecture are

progressing in standards, either complete or in progress› The architecture is built on

– IEEE Shortest Path Bridging – MAC mode (SPBM)› As standardized in IEEE 802.1aq-2012

– IETF Ethernet Virtual Private Network (EVPN) as extended for SPBM interworking

› This is being standardized in draft-ietf-l2vpn-spbm-evp

Introduction

Page 3: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 3

› Key antecedentsto SPB

1. Provider Backbone Bridges (PBB)[802.1ah]

– Full MAC-in-MAC encapsulation– 24-bit I-SID, which is a 24-bit L2 Virtual Network ID

2. PBB Traffic Engineering (PBB-TE) [802.1Qay]– Enabled external control of bridge forwarding with

complete route freedom, i.e.– Software Defined Networking (SDN) with geographical separation

A Bit of History

Dst Addr

Src Addr

802.1D-1990

Ethertype

Payload

C-VID

ProviderBridges (PB)

802.1ad-2005

Ethertype

Payload

C-DA

C-SA

S-VIDEthertype

Ethertype

DA

SA

802.1Q-1998

Ethertype

Payload

VID

Ethertype

Provider Backbone

Bridges (PBB)802.1ah-2008

I-SID

B-DA

B-SA

B-VID

Ethertype

Payload

C-VID

Ethertype

C-DA

C-SA

S-VIDEthertype

Ethertype

Ethertype

I-tagB

-tagB

-MA

CS

-tagC

-tag

optional

Page 4: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 4

› SPBV: SPB VID– VID based– Applicable to all types of

VLANs– Flooding and learning– Plug&play

› SPBM: SPB MAC– MAC based– Designed to leverage the

scalability provided by PBB MAC-in-MAC

– No flooding and learning– Managed environments

What is Shortest Path Bridging (802.1aq SPB)?› SPB is a routed Ethernet solution that has been specified

by the IEEE link state for bridges– IS-IS aspects documented in IETF RFC 6329

› All control functionality has been collapsed into a single protocol (IS-IS)

– Unicast and multicast tree construction, VLAN registration etc.

› Two SPB modes are defined:

Page 5: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 5

› It is compute based: computation instead of signaling› It uses multiple shortest path trees instead of shared

spanning trees– Unicast and multicast frames follow the same path between any two

points in a given VLAN› So no frame misordering & you get meaningful OAM support

› It uses loop mitigation AND loop prevention › It uses edge based load spreading› It is backwards compatible with, and is consistent with the

full body of Ethernet standardization (IEEE 802.1)– CFM, EVB, lossless Ethernet etc.

› It implements the full MEF 12.1 set of service constructs– E-LINE, E-LAN, E-TREE

What is important to understand about SPBM?

Page 6: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 6

› Ability to utilize more richly connected topologies– SPBM supports up to 16 way multi-pathing and is extensible to go

further– Each multipath instance is a full mesh of the network

› Large scale virtualization– PBB data plane scales to billion virtual networks

(24-bit I-SID over 12-bit B-VID: 224 * 212)

› Operational simplicity– All information contained in a single control protocol IS-IS– Single touch adds/moves and changes– Computed multicast– Reduced CP messaging combined with a computation driven

convergence of unicast & multicast is a virtuous circle…

Problems Already Solved

Page 7: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 7

Ubiquity and reach– Interconnect different flavors of “Ethernet”, across the dominant

WAN technology (MPLS)

Preserve operational simplicity– Preserve “single touch” add/move/delete automation– Minimal configuration

›Alignment of BGP and IS-IS control plane paradigms ›Break the scaling barriers of a single routing domain

– Combined SPBM-EVPN allows much larger topologies– Domain isolation to “divide and conquer” state– Operate each SPBM domain on a “need to know” basis– Non-relevant information is excluded from routing advertisement

› Minimize Filtering Database (FDB) state

Solution Objectives

Page 8: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 8

› There are a number of aspect of the solution

1.Topology hiding and abstraction

2.“Need to know” filtering

3.Independence of local multi-pathing

4.Multicast summarization

Solution Overview

BEB PE BEBPEDCN1

EVPN

DCN2MPLSB-VID1

I-SID1

LSP

I-SID1 I-SID1

B-VID2

SPBM SPBM

B-VLAN1

Tenant Virtual Network: I-SID1

B-VLAN2

Tenant’s overlay, e.g. IP subnet or VLAN

EVPN

Page 9: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 9

› Shortest Path Trees (SPT) are the basic connectivity construct for SPBM

– They are edge rooted shortest path, and much finer grained than the shared spanning trees but they are still TREEs

› Which constrains the set of network interconnect mechanisms– The set of fine grained MAC based trees are aggregated into

Backbone VLANs (B-VLAN), where each B-VLAN delineates full mesh connectivity

› EVPN is IP/MPLS based, and uses BGP to sort out mirroring of attached Ethernet networks

› But once in EVPN we can map SPBM connectivity to any paradigm

› The trick is interconnecting them

SPBM and EVPN

Page 10: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 10

› Trees have ROOTs….– Which means interworking needs to pin way points which can then

permit the required design strategies work

› For SPBM-EVPN interworking, we make the interworking function on the EVPN-PE into a “pinned waypoint”

– This has the desirable effect of keep “churn” in subtending SPBM networks out of BGP

› An EVPN-PE that is a “pinned waypoint” for a set of VLANs is known as a “designated forwarder”

Mapping between SPBM & EVPN

Page 11: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 11

› The set of EVPN-PEs attached to an SPBM network self elect which subset of VLANs they will act as Designated Forwarder (DF) for

– This is based on local B-VID

› The DF is then responsible for the relaying of all required state associated with the subset of VLANs it owns between the two control planes, and the interworking of data plane traffic between the SPBM and EVPN networks

– This is simply in the form of a list of I-SIDs/B-MAC tuples– No topology information is leaked, the DF condenses all topology

behind it down to a single node representation into the peer network– The DF also “re-roots” all (S,G) multicast trees that transit it by

“blindly” rewriting “S” (Source)

Designated Forwarder

Page 12: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 12

DF Control Plane Interworking

› DF has a Control Plane Interworking function1. It proxies B-MAC/I-SID announcements from ISIS-SPB into

BGP for the set of I-SIDs it is DF for2. It will only proxy B-MAC/I-SID announcements from EVPN

into ISIS-SPB if there is already locally registered interest in the I-SID

BGP has the whole picture, IS-IS is “need to know”

PEDC WAN

IS-IS BGP

IS-ISDatabase

BGPDatabase

Control PlaneInterworking

Function

IS-IS PDUs BGP PDUs

PBBN MPLS

Page 13: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 13

EVPN-SPBM data plane

BEB1DF1 BEB2

DF2DCN1 DCN2MPLSB-VID1

I-SID1

LSP

I-SID1 I-SID1

B-VID2

SPBM SPBMEVPNVM1 VM2

I-SID1

B-DA: DF1

B-SA: BEB1

B-VID1

C-DA: VM2

C-SA: VM1

Payload

I-SID1

B-DA: BEB2

B-SA: DF2

B-VID2

C-DA: VM2

C-SA: VM1

Payload

I-SID1

MPLSB-DA: DF2

B-SA: DF1

C-DA: VM2

C-SA: VM1

Payload

I-SID1

B-DA: BEB1

B-SA: DF1

B-VID1

C-DA: VM1

C-SA: VM2

Payload

I-SID1

B-DA: DF2

B-SA: BEB2

B-VID2

C-DA: VM1

C-SA: VM2

Payload

I-SID1

MPLSB-DA: DF1B-SA: DF2

C-DA: VM1

C-SA: VM2

Payload

Page 14: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 14

› Islands are decoupled by keeping B-Tags out of the EVPN core

– What the core sees is MPLS encapsulated B-MACs and I-SIDs

› B-Tags stripped by PEs on ingress to EVPN› B-Tags locally added by PEs on egress from EVPN

– So the core is independent of however multi-pathing is implemented in each subtending island, or whether a PBBN exists at all (e.g. PBB-PEs)

› Multicast MACs are aggregated at SPBM ingress

DF Data Plane Procedures

DFPBBN MPLS

Ethernet Frames MPLS PacketsStriptags

BMAClookup

Add labelstack

Ethernet Frames MPLS PacketsAddtags

BMAClookup

Strip labelstack

Unicast interworking

Page 15: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 15

› Objective is to get away from the inefficiencies of edge based replication in the PEs while minimizing the multicast state impact in the core

› VLAN emulation can use lots of Multicast Distribution Trees (MDTs)

› These can be aggregated into shared MDTs between larger sites

– Shared MDTs can substantially reduce the amount of multicast state in the MPLS core to service large sites

– Smaller sites may more likely benefit from service specific MDTs› So we will support both

Add Multicast in the MPLS Core

Page 16: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 16

› Issue is how to resolve VLANs to shared trees without getting into resolution servers or provisioning

› One way to do this is to algorithmically “name” the tree – (*,G) or (S,G) where G is a sorted list of leaf node IDs

› Via BGP every PE has sufficient information to construct the names of the MDTs

› mLDP permits arbitrary opaque identifiers for MDTs to be used as a multicast FEC so the algorithmically constructed names can be used directly in signaling

Shared Multicast Distribution Trees

Page 17: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 17

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE2 PE6

PE5

PE3

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CERSTP

IS-IS

IS-IS

DF

DF

DF

PE2, PE3 and PE5 are DFs for a common set of VLANs

Page 18: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 18

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE6

PE5

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CE

RSTP

IS-IS

IS-IS

DF

PE2DF

PE3DF mLDP

Page 19: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 19

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE6

PE5

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CE

RSTP

IS-IS

IS-IS

DF

PE2DF

mLDP

PE3DF

I am PE 3, and I have 10 VLANs that need

(*,G) multicast to myself and PEs 2, and 5 so

the FEC is PE2+PE3+PE5

Page 20: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 20

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE6

PE5

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CE

RSTP

IS-IS

IS-IS

DF

PE3DF

PE2DF

mLDP

I am PE 2, and I have 10 VLANs that need

(*,G) multicast to myself and PEs 3, and 5 so

the FEC is PE2+PE3+PE5

Page 21: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 21

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE6

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CE

RSTP

IS-IS

IS-IS

PE2DF

PE3DF

PE5DF

I am PE 5, and I have 10 VLANs that need

(*,G) multicast to myself and PEs 2, and 3 so

the FEC is PE2+PE3+PE5

Page 22: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 22

Example

802.1aq SPBM

802.1aq SPBM

802.1ad PBN

802.1aq SPBM

EVPN + mLDP

PE1

PE6

PE4

PBBPE7

BGP

IS-IS

CECE

CE

CE

RSTP

IS-IS

IS-IS

PE3DF

PE2DF

PE5DF

Resulting MDT

Page 23: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 23

› mLDP like PIM is rather chatty, and based on transactional convergence

› If I had 10000 VLANs spread across the 3 sites in the example I WOULD have 10000 (*,G) or 30000 (S,G) trees

› For 3 dual homed sites, there are ONLY 8 possible (*,G) and 24 possible (S,G) shared trees

– It becomes practical to simply “nail them up” and modify the membership set of each tree at the ingress

› Result is both scalable and stable

What does this get me?

Page 24: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 24

› Assumption of rich mesh hidden from SPBM in the first place

– Exposing a large highly regular CLOS topology in link state simply burdens the control plane

› Some topological summarization is required in the first place to usefully scale individual sites to 100,000 servers+ with existing technology

› There is lots that can be done to engineer an SPBM network both with the vanilla standard, and with techniques currently under research

– Deterministic aggregated trees lend themselves to “demand engineering” with automation

› Work needs to be done to seamlessly extend this into the EVPN realm

Key Insights & Next steps

Page 25: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis

Ethernet Routing for Large Scale Distributed Data Center Fabrics | 2013-11-13 | Page 25

› The totality, completeness and self-consistency of IEEE data center networking solutions is impressive

– From OAM to Edge Virtual Bridging

› SPB permits this to scale to orders of magnitude beyond what Ethernet previously was capable of

› Adding EVPN is a form of “multi-area” solution adds orders of magnitude beyond what SPB alone can do…

Summary

Page 26: Slide title 70 pt CAPITALS Slide subtitle minimum 30 pt Ethernet Routing for Large Scale Distributed Data Center Fabrics Dave Allan, János Farkas, Panagiotis