bn chapter 6

High Performance Networks

Chapter 6

Multi-Protocol Label Switching Introduction

MPLS is an important tool for backbone service provider, e.g. carrier service providers, and Internet Service Providers (ISPs) to solve network problems including: scalability, speed, QoS management and traffic engineering.

MPLS represents the convergence of connection-oriented

forwarding techniques and the Internets routing protocols

MPLS leveraged the high performance cell switching capability of ATM switch hardware and melded them together into a network using existing IP routing protocol.

MPLS:

1. supports traffic engineering, i.e. put the traffic where the bandwidth is. MPLS provides the ability to explicitly set single or multiple paths that the traffic will take through the network. This feature allows optimizes bandwidth utilization of underutilized paths and solve congestion.

2. allows service providers to create layer 3 (L3) virtual private networks (VPNs) across their backbone network for multiple customers.

3. supports QoS. Service providers can provide multiple classes of service with hard QoS guarantees to their VPN customers.

MPLS was based on and evolved from Ipsilon/Nokias IP

Switching and Ciscos Tag Switching.

MPLS Elements Terminology:

Label Switch Path (LSP). This is a unidirectional logical path that an MPLS frame

travels the network. Forwarding Equivalence Class (FEC).

The FEC represents a group of packets that share the same requirements for their transportation. All packets in such a

High Performance Networks 6-1

group are provided the same treatment en route to the destination. The assignment of a particular packet to a particular FEC is done once when the packet enters the MPLS network. FECs are based on service requirements (e.g. QoS requirements) for a given set of packets or simply for an address prefix, e.g. IP address prefix.

Label Switch Router (LSR). AN LSR: 1. is a core router in an MPLS network; 2. participates in the establishment of LSPs using Label

Signaling Protocols, e.g. RSVP-TE (RSVP traffic engineering/tunnel extension) or CR-LDP (constraint-routed label distribution protocol); and

3. performs high speed switching of labeled traffic on established LSPs.

Label Edge Router (LER).

1. An LER is a device at the edge of an MPLS network. It can be an Ingress LSR or an Egress LSR;

2. The Ingress LSR assigns an FEC to an incoming packet once;

3. The packets in an FEC are then assign to an LSP based on traffic criteria; and

4. The Egress LSR removes labels from traffic coming in from an incoming LSP.

Note: An LSP extends from and Ingress LSR to an Egress LSR.

Label Information Base (LIB). An LIB is a table created in each LSR or LER that relates incoming label and interface to outgoing label and interfaces. An LIB also contains FEC-to-label bindings.

Packet-based MPLS networks can have IP router-based LER with

either 1. ATM based Core LSRs OR 2. IP router-based Core LSRs, and use packet based transport technologies to link the LSRs. ATM based Core LSRs are ATM switches with their control plane replaced with IP control plane (e.g. running an instance of the networks Interior Gateway Protocol (IGP))


Figure 6.1 shows a typical MPLS network. On each physical link, a particular label specific within the context of that link represent a segment of an LSP. The association between actual label values and LSP at any hop can be created on demand by RSVP-TE or CR-Label.

Figure 6.1 MPLS Consists of Edge, Core and Label Switched Paths (LSPs)

MPLS Backbone/Domain

LER Core LSR LER

Label Switched Paths and Per-hop Processing An LSP need not follow the shortest path between any two

edge LSRs. External routing algorithms can be used to determine new routes (non-shortest) for LSPs. This can result in more optimal distribution of loads around a network.

Multi-protocol Label Switching Label Encoding

o A label can be mapped to an ATM VPI/VCI or to a Frame Relay DLCI

o For layer 2 (L2) protocol that does not offer a label-type field (e.g. Ethernet or the Point-to-Point Protocol), then the 32-bit MPLS label (or label stack) forms a shim layer between layer 2 and the network layer (see figure 6.3a). Figure 6.2 shows the structure of the generic MPLS Frame and the format of each label: The 20-bit label indicates the LSP to which the

packet belongs The 3-bit experimental field, e.g. indicates

additional queuing and scheduling disciplines independent of the LSP

The 8-bit Time to Live (TTL) field is defined to assist in the detection and discard of looping MPLS packets


The S bit is set to 1 to indicate the final stack entry before the original packet

o The stacking scheme allows for LSPs to be tunneled through other LSPs.

Figure 6.2 MPLS Label Stack Encoding for Packet Oriented Transport

o Encoding Labels on Specific Links For packet-based link layers, the MPLS frame is

simply placed within the links native frame format. Figure 6.3a shows the frame format for MPLS frame carried by PoS.

For MPLS frames carried by ATM, see figure 6.3b, packet to cell conversion is required at the ingress node and vise versa at the egress node of the LSP for MPLS frame carried by ATM.

Figure 6.3an MPLS Encoding for Point-to-Point Protocol (PPP) over SONET links

Shim layer

Figure 6.3b MPLS Encoding for ATM Links

Label Creation and Binding

For L2 protocol that does not offer a label-type field, an LSR or LER can create a label or select one from a pool of labels and then bind it to an FEC as result of some events that indicates a need for such label creation and binding. The events that trigger label creation and FEC-label binding can be:


1. the reception of a signaling message, e.g. RSVP-TE messages; or

2. the reception of a data packet.

Data Packets Processing in a Core LSR. It is not necessary for a core LSR to classifying (i.e. separating) incoming packets based on their IP header contents (e.g. destination address). The MPLS label itself together with the identity of the arrival interface provide all the necessary context to determine a packets next hop and the associated metering, policing (i.e. packet dropping) or marking, queuing, and scheduling rules, see figure 6.4. Note the separation of control plane and forwarding plane in figure 6.4. The forwarding plane is used to forward user data traffic. The control plane is responsible for the distribution of network topology and traffic engineering attributes, and for the setting up of LSPs. The switching table (also known as forwarding table) contains one or more LIB(s) for labels the LSR knows about, including a new label to apply when the packet is forwarded. Figure 6.5 shows an example of the format of a switching table. Switching table entries are modified whenever a new label needs to be activated or an old label needs to be removed.

Figure 6.4 Simplified diagram for a core LSR. The forwarding plane shows the components of the forwarding engine

Switching table

LIB

LIB

LIB

Policing & Marking

Switching Fabric

Queuing & Scheduling

MPLS Frame Context

Context influences subsequent processing

MPLS LABEL IP Packet

Routing (OSPF-TE, IS-IS-TE)

Signaling (RSVP-TE, CR-LDP)

Control Plane

Forwarding Plane

Input Ports

Output Ports


Figure 6.5 An Example of a Switching/Forwarding/LIB Table

In port In label Output port Out Label Action 1 46 10 48 Swap 2 102 4 201 Push 11 111 5 - PoP

Data Packets Processing in an LER (both ingress & egress)

o An LER is located at the edge of an MPLS network, see figure 6.1. An LER originates and/or terminates LSPs and performs both label-based forwarding and conventional IP routing functions

o At ingress, an LER accepts unlabeled packets and creates an initial MPLS Frame by pushing one or more MPLS Label entries.

o On egress, the LER terminates an LSP by popping the top MPLS stack entry and forwarding the remaining packet based on rules indicated by the popped label, e.g. if the payload is an IPv4 packet then it should be forwarded according to IP routing rules.

o Figure 6.6 shows the simplified diagram of the components in the data forward plane of an Ingress LSR. It shows how an incoming IP packet is being labeled for transmission out an MPLS interface.

o Conventional IP packet processing/classification determines the FEC (e.g. based on the destination IP address prefix, QoS requirements, etc.). The forwarding table provides the FEC-to-label binding which in turn will determine the LSP for this packet.

o Once labeled, packets are transmitted into the Core along the chosen LSP.

High Performance Networks 6-6Figure 6.6 Forwarding Plane of an Ingress Label Edge Router Note: The MPLS output technology can be either packet or cell based. When cell based, the MPSL frame is further segmented and the VPI/VCI is set to the value of the top label in the MPLS Label Stack

o An ingress LSR classifies incoming IP packets, using

as much header information as necessary to map packets to the correct LSP and to correctly set the Experimental bits.

o In general, the number of per-hop behaviors (e.g. the number of queues at each outgoing interface port) in LERs is more than that in core LSRs. So packets that are being classified into different per-hop behaviors at the ingress LSR may end up being mapped to the same per-hop behaviors at the core LSRs. This is referred to as an MPLS Behavior Aggregate. The label field, the experimental field or some combination of both fields may define an MPLS Behavior Aggregate.

o Besides traffic classification, ingress LSRs are also responsible to perform other traffic conditioning functions, e.g. rate shaping and/or policing/marking of the traffic going onto particular LSPs to maintain overall service goals.

o Ingress rate shaping is required on traffic that is destined to be combined with other traffic to form a behavior aggregate further downstream. For behavior aggregates encompassing multiple micro-flows, core LSRs are unable to mediate between aggressive and non-aggressive micro-flows within the behavior aggregate. Ingress rate shaping bounds the interference between micro-flows making up a behavior aggregate.

Driving Per-hop Behavior for QoS Requirements

o By defining a particular LSP is associated with a particular FEC. A packet is assigned to an FEC at the Ingress LSR when it enters an MPLS domain. The LIB provides the FEC-to-label binding which in turn maps to an LSP. There are 3 approaches for establishing edge-to-edge QoS requirements: 1. Use only the label field to encode both the next

hop (i.e. path information) with each distinct queuing and scheduling behavior as a new FEC (LSP), i.e. LSPs following the same path can have different QoS behavior. See figure 6.7a.


This approach will create a huge amount of active LSPs.

2. The Experimental field encodes up to 8 additional queuing and scheduling behaviors for the same FEC (LSP). See figure 6.7b.

3. The Experiment field encodes up to 8 queuing and scheduling behaviors independent of the FEC (LSP). Figure 6.7c shows that the label field encodes only the next hop (i.e. path) information. Destination IP address prefixes can be used as the only attribute to create LSPs for this approach. Besides, this approach will create the least amount of active LSPs.

Figure 6.7a The Label alone can provide Per-Hop Context

Figure 6.7b The Label and Experimental Bits together provide Per-hop Behavior Context

Figure 6.7c Experimental Bits alone provide Per-hop Behavior Context

o An MPLS network must provide appropriate policing

and traffic rate shaping (smoothing of traffic bursts) at the edges when the core LSRs are queuing and scheduling on limited information. An MPLS network that takes path information (the Label value) as part of the packets context can have more finely grained control over per-hop resource sharing than a DiffServ network can have. This is because MPLS Label value is essentially a compressed version of the information derived from multi-field (MF) classification (e.g. using the


destination address and the type of service fields at IP-v4 header for traffic separation) at the ingress to the MPLS network. If the Edge LSRs classify individual flows onto their own LSPs, the Label value at any hop allows a Core LSR to know enough contexts to differentiate packets at flow level.

Traffic Engineering over MPLS The task of mapping traffic flows onto an existing physical

topology is called traffic engineering. Specifically, traffic engineering provides the ability to move traffic flows away from the shortest path selected by the IGP (e.g. OSPF or IS-IS) and onto a potentially less congested physical path across the service providers network, see figure 6.8

IGP Shortest Path from Hong Kong to Peking

Traffic Engineering Path from Hong Kong to Peking

Hong Kong Peking

LER

LSR

LSR LSR

LER

Figure 6.8 Traffic Engineering Path verse IGP Shortest Path across an MPLS Network

Traffic engineering allows service providers to balance the traffic load on the various links, routers and switches in the network so that none of these components is over-utilized or under-utilized. In this way, service providers can exploit the economies of the bandwidth that has been provisioned across the entire network. This helps to cut down the operational cost and capital investment.

Applications for Traffic Engineering

Existing IGPs can actually contribute to network congestion because they do not take bandwidth availability and traffic characteristics into account when building their routing tables. Service providers understand that traffic engineering can be used to significantly enhance the operation and performance of their networks. They intent to use the capabilities of traffic engineering to:


1. route primary paths around known bottlenecks or points of congestion in the network;

2. provide precise control over how traffic is rerouted when the primary path is faced with single or multiple failures;

3. provide more efficient use of available bandwidth and long haul fiber, i.e. make sure there is no over-utilized or under-utilized components;

4. make themselves more competitive within their markets by maximizing operational efficiency, resulting in lower operational costs;

5. enhance the traffic oriented performance characteristics of the network by minimizing packet loss, minimizing prolonged periods of congestion and maximizing throughput;

6. enhance statistically bounded performance characteristics of the network (such as loss ratio, delay and jitter) that are required to support multi-services; and

7. provide more options, lower costs and better services to their customers.

History

In the early 1990s, service providers (e.g. ISPs) networks were composed of routes interconnected by leased lines T1 and T3 links. When the demand for bandwidth increased faster than the speed of individual network links as in the case of Internet growth spurt in those days, the services providers responded by simply provisioning more links to provide additional bandwidth. The traffic-engineering tool available in those days for router-based networks was the manipulation of routing metrics. Referring to figure 6.9, assume Network A sends a large amount of traffic to Network C and Network D. With the metrics in this figure, links 1 and 2 might be congested. This is because both Network A-to-Network C and Network A-to-Network D flow over link 1 and 2. However, if the metric for link 4 were changed to 20, then Network A-to-Network D would be moved to link 4.

Figure 6.9 Metric Based Traffic Control

Network A

Router A

Link 4 Metric = 40

Network B Network C

Link 1 Link 3 Metric = 10 Metric = 10

Link 2 Router B

Metric = 10

Network D

Router D

Router C


There are quite a few limitations in metric based traffic control and router based core networks: 1. Traffic engineering based on metric manipulation is

not scalable. A metric adjustment in one part of a large network is very likely to cause problems (i.e. hot spots) in another part of the network. Metric adjustment is a trial-and-error approach, not a scientific solution;

2. Tradition software based routers have limited packet processing power and aggregate bandwidth; and

3. IGP (e.g. OSPF, IS-IS & RIP) route calculation is topology driven and is based on simple additive metric such as the hop count or an administrative value. Bandwidth availability and traffic characteristics, i.e. traffic load in the network, are not taken into account when the routes are calculated. This results in non-even distribution of traffic across the network, causing inefficient use of expensive resources and congestion as well.

Around the mid 1990s, the volume of Internet traffic reached

a point that ISPs and Internet backbone service providers were required to migrate their networks to support trunks that were larger than T3 (45 Mbps). At that time, OC-3 (155 Mbps) ATM interfaces were available. Service providers then moved their router-based networks to the IP overlay model (i.e. IP over ATM), see figure 6.10.

Operation of IP over ATM:

ATM Core

ATM switch ATM switch

ATM switch ATM switch

POP 1 POP 4

POP 2 POP 3

Router

Figure 6.10 The IP over ATM (IP overlay) Model OC- 3 Links OC-12 Links


PVCs traversing the ATM core are used as logical circuits to provide connectivity between edge routers. A set of PVCs is configured to fully connect the routers at the edge of the ATM core. This maps the physical topology in figure 6.10 to a logical topology as shown in figure 6.11. The physical paths for the PVC overlay are usually calculated by an offline configuration utility. The PVC paths and attributes are globally optimized by the offline configuration utility based on link capacity and historical traffic patterns (i.e. traffic engineered). The offline configuration utility can also calculate the set of secondary backup PVCs that is ready to respond to failure conditions. Finally, after the globally optimized PVC mesh has been calculated, the supporting configurations are downloaded to the routers and the ATM switches to implement the single or double full-mesh logical topology, see figure 6.11. When congestion occurs, a new trunk is added or a new POP is deployed.

The edge routers have knowledge only of the individual PVCs that appear to them as simple P2P circuits between two routers. They do not have any knowledge about the ATM infrastructure. The mapping of IP prefixes (routes) to PVCs at each edge router is also determined by and downloaded from the offline configuration utility. Finally, ATM PVCs are integrated into the IP network by running the IGP across each of the PVCs to establish peer relationships and exchange routing information. Advantages of the IPoATM Model: 1. ATM offered the bandwidth that service providers

needed in the mid 1990s. 2. ATM supports traffic engineering via the manipulation

of PVCs. 3. ATM provides deterministic performance.

POP 4

POP 2 POP 3

POP 1

Router

Figure 6.11 Logical IP Topology over an ATM Core

ATM Core


4. Compared with software-based routers, ATM switches forward the packets/cells much faster, provide higher-speed interfaces and significantly greater aggregate bandwidth.

Disadvantages of the IPoATM Model: 1. It is complicate and expensive to co-ordinate, operate

and manage two different networks. 2. Currently, the maximum speed supported by ATM

interfaces is only up to OC-48 (2.488 Gbps). It is complicate and expensive for ATM to support interfaces beyond this speed.

3. A cell tax (i.e. overhead) of around 20% is paid when IP packets are carried over ATM infrastructure.

4. ATM suffers from the n-square PVC problem. 5. A large number of full-mesh connected routers create

IGP stress. 6. Not being able to seamlessly integrate L2 and L3.

Components of MPLS Traffic Engineering

MPLS provides router-based traffic engineering solution. There are four functional components: 1. information distribution; 2. path selection; 3. signaling; and 4. packet forwarding

Information Distribution Traffic engineering requires details knowledge about 1. network topology; and 2. dynamic information about network loading. Distribution of this information for traffic engineering can be achieved by simple extensions to OSPF or IS-IS to include link attributes as part of each routers link state advertisement. Extensions to OSPF and IS-IS for traffic engineering are known as OSPF-TE and IS-IS-TE respectively. IS-IS-TE is achieved by defining new Type-Length Values (TLVs)/objects. OSPF-TE is implemented with Opaque LSAs (link state advertisements). The standard flooding algorithm used by link state IGP ensures that link attributes are distributed to all routers in the service providers routing domain. Some of the traffic engineering extensions added to the IGP link state advertisement are:


1. maximum link bandwidth; 2. maximum reservable link bandwidth; 3. current bandwidth reservation; 4. current bandwidth usage; and 5. link color. Links or resources can be classified into

different classes. Links or resources indicated by the same color are said to belong to the same class. For example, if OC-48 links are indicated with a color, then all other types of links of OC-48 capacity have the same color. The links or resources color attribute can be used to implement policies to optimize network performance, e.g. to implement the generalized inclusion and exclusion policy to restrict the placement of traffic to a specific subset of links or resources.

Figure 6.12 shows the components in the control plane of an LSR. Components involved for information distribution are bocks highlighted with bold lines. As shown in the figure, every LSR in an MPLS domain maintains network link attributes and network topology in a specialized Traffic Engineering Database (TED). The TED is used exclusively for calculating explicit paths for the placement of LSPs across the physical topology. IGP continues the calculation of the traditional shortest path based on the information contained in the routers link state database.

Packet Forwarding Plane Components Data Forwarding Plane

Packets Out

Packets In

Control Plane IGP Route Selection

Link State Database

LSP Route Selection

TE Database

OSPF-TE/IS-IS-TE Routing

Signaling Component (RSVP-TE or CR-LDP)

LSP Setup

LSP Setup

LSR

Information Flooding

Fi

InformationFlooding

gure 6.12 Information Distribution Components

Path Selection Based on the information stored in the TED, every ingress LSR in the MPLS domain can calculate the paths of its own set of LSPs across the MPLS backbone. The path for each


LSP calculated can be either a strict explicit route or a loose explicit route. A strict explicit route specifies all the LSRs in the LSP while a loose explicit route specifies only some of the LSRs in the LSP, see figure 6.13. The concept of constraint-based routing is used to calculate the physical path for an LSP. Constraint-based routing can co-exist with IGP routing. Constraint-based routing takes in more attributes or constraints such traffic flow or traffic trunk attributes (to be discussed later), network technology, link attributes, etc to calculate the path. The resultant path created is usually deviated from the shortest path calculated by traditional IGP, e.g. OSPF. With reference to figure 6.14, the Constraint Shortest Path First (CSFP) algorithm takes into account specific restrictions stored in TED to calculate the shortest path across the MPLS backbone. Input into the CSPF algorithm includes:

Figure 6.13 Ingress LSR Calculates Explicit Routes

Peking

HongKong

Ingress LSR

LSR 1

LSR 7 LSR 9

Egress LSR LSR 4

LSR 8

LSR 5

LSR 2

LSR 6

LSR 3

StrictLoose:

: [4, 5, 6] [4, 9]

MPLS Backbone

1. topology link state information; 2. attributes associated with the state of network

resources, e.g. maximum link bandwidth, current bandwidth reserved, etc. (i.e. link attributes);

3. attributes required to support traffic traversing the proposed LSP, e.g. bandwidth requirements (i.e. traffic flow or traffic trunk attributes); and

4. other administrative attributes, e.g. maximum hop count, administrative policy requirements such as the inclusion or exclusion of certain class of link.

Note: 1 and 2 above is distributed by OSPF-TE or IS-IS-TE. 3 and 4 are entered by the operator.


The output of the CSPF calculation is an explicit route consisting of a sequence of LSR addresses that provides the shortest path through the network that meets the constraints. This output is then passed to the signaling component to setup the LSP, see figure 6.14.

The online path calculation discussed above is not deterministic, that is the physical path selected for an LSP depends on the order in which this LSP is calculated. Therefore, usually an offline planning and analysis tool is available for global optimization. This offline tool performs global path calculation for all the required LSPs simultaneously and selects the best solution for the network as a whole. The output of the offline calculation is a set of explicit routes for the LSPs. These LSPs can be installed in any order and the utilization of network resources is optimal. Traffic Trunks As indicated in IETF RFC 2430, the definition of traffic trunks is: A traffic trunk is an aggregation of traffic flows of the same class which are placed inside an LSP. Traffic flows that share specific attributes, e.g. ingress LSR, egress LSR, average rate, peak rate, priority, FEC, etc., belong to the same class. Traffic trunks can be mapped to a set of LSPs and can also be moved from one LSP to another LSP either automatically or through administrative intervention. This enables the network to adapt to changing load condition. As described in IETF RFC 2702, the following attributes of traffic trunks are significant for traffic engineering:

Packet Forwarding Plane Components Packet Forwarding Plane

Packets Out

Packets In


Link State Database

CSPF Path Selection

TE Database


Signaling Component (RSVP-TE or CR-LDP)

LSP Setup

LSP Setup

Ingress LSR


Fi

InformationFlooding

gure 6.14 Path Selection Component

Explicit Route



1. Traffic parameter attributes. This is the resources requirement of a traffic trunk, e.g. peak rate, average rate, burst size, etc.

2. Generic path selection and maintenance attributes. These attributes define how paths are selected, i.e. via constraint-based routing signaling (e.g. RSVP-TE), IGPs or other manual means.

3. Priority attribute. This attribute defines the relative importance of traffic trunks.

4. Preemption attribute. This attribute determines whether a traffic trunk can preempt another traffic trunk from a given path.

5. Resilience attribute. This attribute indicates whether to reroute or leave the traffic trunk as is under a failure condition.

6. Policing attribute. This attribute determines the actions that should be taken by the network when a traffic trunk exceeds the traffic parameters specified in its contract.

7. Resource attributes. This attribute constrains the placement of traffic trunks. For example, use the resource color/class attribute to include or exclude a specific set of resources in the placement of traffic trunks.

Signaling Component

After the explicit route for an LSP has been calculated, this LSP can be installed by: 1. manual configuration; or 2. using a signaling protocol to establish the LSP and

distribute the labels, see figure 6.15. Manual configuration requires going into each and every LSR along the path and specifying the incoming label/interface and outgoing label/interface. This is much like provisioning ATM PVCs.

Packet Forwarding Plane Components Packet Forwarding Plane

Packets Out

Packets In


Link State Database

LSP Path Selection

TE Database


Signaling Component (RSVP-TE, LDP or CR-LDP)

LSP Setup

LSP Setup

LSR



Figure 6.15 Signaling Component

Explicit Route

Using a signaling protocol is the preferred way to setup LSPs and distribute labels. There are 3 label distribution protocols used for label distribution and/or setting up LSPs in MPLS. They are: 1. LDP (Label Distribution Protocol).

LDP does not support traffic engineering nor explicit routes. It executes hop-by-hop, i.e. each LSR along the path looks at the IP routing table to determine where the next hop for the LSP. It uses the same path as the IGP. It is usually used to distribute labels between LER peers in a targeted LDP session in MPLS.

2. RSVP-TE (RSVP with traffic engineering). RSVP-TE supports traffic engineering and is widely deployed in MPLS.

3. CR-LDP (Constraint-based routing LDP) CR-LDP extends LDP to support explicit routes (i.e. traffic engineering).

o More on Label Binding and Label Distribution

a. Label binding is the mapping between an FEC and a label. FEC-to-label bindings are stored in the LIB table in an LER or LSR. FEC-to-label binding can be triggered by some control events, for example when an LSR receives a label binding request from an upstream LSR (downstream-on-demand); or when an LSR discovers a next hop for a particular FEC.

b. Downstream-on-demand Label Distribution (figure 6.16) 1. Upstream LSR1 recognizes downstream

LSR2 as its next-hop for an FEC 2. A request is made to LSR2 for a binding

between the FEC and a label 3. If LSR2 recognizes the FEC and has a

next hop for it, it creates a binding and replies to LSR1

4. Both LSRs then have a common understanding about this label.


Example of protocol that distributes labels in this fashion is RSVP-TE.

c. Downstream Unsolicited Label Distribution

(figure 6.17)

Request for Binding for an FEC

Label-FEC binding

Figure 6.16 Downstream on demand Label Distribution

LRS1 LSR2

dupstream ownstream

1. LSR2 discovers a next hop further downstream for a particular FEC

2. LSR2 generates a label for the FEC and communicates the binding to LSR1

3. LSR1 inserts the binding into its LIB/switching table

4. If LSR2 is the next hop to the FEC, LSR1 can use that label knowing that its meaning is understood.

Example of protocol that distributes labels in this fashion is CR-LDP. Note: LDP also supports downstream on demand.

d. Label Control MPLS defines modes for distribution of labels to neighboring LSRs. 1. ordered In this mode, an LSR binds a

label to a particular FEC and distribute this binding to its upstream peers if and only if it is the egress LSR OR it has received the label binding for the FEC from its next hop (downstream) LSR. RSVP-TE works in this mode.

2. independent In this mode, once an LSR recognizes the next hop for a particular FEC, it makes the decision to bind a label to the FEC independently to distribute the binding to its peers.

e. Label Retention MPLS defines the treatment for FEC-to-label bindings received from LSRs that are not the

Label-FEC binding

Figure 6.17 Downstream Unsolicited Label Distribution

LRS1 LSR2

upstream downstream


next hop for a given FEC. Two modes are defined: 1. conservative In this mode, these FEC-

to-label bindings are discarded. This mode requires an LSR to maintain fewer labels. This mode is for ATM-LSRs.

2. liberal In this mode, these FEC-to-label bindings are retained. This mode allows for quicker switching of traffic to other LSPs in case of topology changes.

o Label Distribution Protocol (LDP)

The LDP is a new protocol defined by IETF for the distribution of FEC-to-label binding information to LSRs in an MPLS network. It is used to map FECs to labels, which in turn create LSPs. The physical path taken by these LSPs are the same as the routes calculated by traditional IGP. LDP sessions are established between LDP peers in the MPLS network (not necessary adjacent). The peers exchange the following types of LDP messages: 1. discovery messages announce and maintain

the presence of an LSR in a network. 2. session messages establish, maintain and

terminate sessions between LDP peers. 3. Advertisement messages create, change and

delete label mappings for FECs. 4. notification messages provide advisory

information and signal error information. o RSVP Traffic Engineering/Tunnel Extension

(RSVP-TE), IETF RFC 3209 Brief Review of RSVP Resource Reservation Protocol (RSVP), an

IETF standard (RFC 2205), specifies resource reservation techniques for IP networks

RSVP is a protocol that enables resources (e.g. link bandwidth, queuing space, switching bandwidth) to be reserved for a given session (or sessions) prior to any attempt to exchange media between the participants. Note: RSVP does not carry user data. User data is transported by RTP after the reservation procedures are performed.


RSVP provides strong QoS guarantees, significant granularity of resource allocation, and significant feedback to applications and users.

RSVP also has the ability to support protection, i.e. traffic restoration in case of failure, in a timely fashion (less than 50 msec) in MPLS.

RSVP Messages Syntax The message format is Type-Length-Value. A type field identifies the message type, followed by a length field, followed by the data itself

RSVP messages These messages (i.e. types) are: 1) Path, 2) Resv, 3) PathErr, 4) ResvErr, 5) PathTear, 6) ResvTear and 7) ResvConf Each RSVP message will carry a number of objects. This will be discussed later.

Figure 6.18 shows how RSVP establishes a session between host A and host B. 1. Host A first issues a PATH message to

the far end via a number of routers. This message carries the traffic specifications, e.g. the bandwidth and packet size, about the data the sender expects to send.

2. Each RSVP-enabled router along the way establishes a path state that includes the previous source address of the PATH message (i.e. the next hop back to the sender).

3. The receiver of the PATH message responds with a Reservation Request (RESV) message. The receiver will indicate the type of reservation service requested, e.g. Controlled-load service or Guaranteed service defined in Integrated Services.

4. The RESV message travels back to the sender along the same route that the PATH message took (but in reverse). At each router, the requested resources indicated in the FlowSpec object (see table 6.1 below) are allocated if they are available.


5. Finally, the RESV message reaches the sender with a confirmation that resources have been reserved.

Note: RSVP supports QoS reservation in routers along the IGP path between a pair of hosts. This creates scalability problem because each router along the path has to maintain the per-flow state between a pair of hosts and there can be millions of hosts requesting RSVP service from the network simultaneously.

PATH PATH PATH PATH

RESV RESVRESVRESV

Figure 6.18 RSVP Used for Resource Reservation

der receiver

st A Host B

sen

Ho

Objects carried in RSVP messages. The following table shows some of the objects carried in RSVP messages.

Table 6.1 RSVP Objects Object Functions Information in the Object RSVP

Messages Session To identify a session 1) Dest. IP address,

2) Dest. IP port number (optional), 3) Class type, e.g. IPv4/UDP or IPv6/UDP

All

RSVP_HOP To identify the previous node through which this RSVP message come from

1) IP address and interface of an node

All

Time_Value To indicate the time out period of this RSVP message

1) Time out period in msec ALL

TSpec To indicate the traffic specifications the sender expects to send

1) Specifications for metering. For example, a token bucket. This includes (i) a token bucket size in bytes (this limits the input max. burst size) and (ii) a token bucket rate (this limits the input average rate) 2) Peak rate in bytes/sec 3) Maximum packet size in bytes 4) Minimum policing unit in bytes, e.g. m bytes. Packets shorter than m bytes will be consider as m bytes long)

PATH RESV


Sender_ Template

To describe the format of data packets that a specific sender (i.e. host) will originate. This template is in the form of a FilterSpec that is typically used to select this senders packets from others in the same session on the same link.

See FilterSpec PATH

FlowSpec 1) To specify the desired QoS 2) To indicate the parameters of the desired QoS control service. 3) To indicate the accepted traffic specifications

1) A service number specifies either Guaranteed or Controlled-load service defined in IntServ 2) An RSpec (reserve spec.) contains the parameter(s) for the specified service number, e.g. bandwidth, maximum delay, packet, etc.. This parameter defines the desired QoS 3) A TSpec object,

RESV

FilterSpec To be used together with the session object to define the set of data packets (the flow) that receives the service defined by the FlowSpec object. It indicates information of the sender, see the column on the right). It is sent as a Sender_Template in the PATH message.

1) IP address of the sender (i.e. host) 2) IP port number of sender (i.e. host). This is an optional parameter. It is included, for example, in video conferencing with the video stream and the audio stream request for different QoS treatments defined different FlowSpec

PATH, RESV

ADSpec To allow the sender and the routers along the path to advertise their QoS capabilities to the receiver(s). When the ADSpec objects reaches a receiver in the PATH message, it provides a pretty good indication of what the receiver can reasonably request in term of QoS from the routers and sender

1) Parameters of QoS capability, e.g. link bandwidth.

PATH

The reservations that RSVP makes are soft,

which means that they need to be refreshed/updated on a regular basis by the receiver(s).

RSVP is used in IntServ and DiffServ for resources reservation, and RSVP-TE (traffic engineering) is used for setting up MPLS LSPs.


Label Binding and LSP Tunnel Establishment using RSVP-TE In the late 1990s, IETF extended RSVP to

support traffic engineering and solve the scalability problem. In RSVP-TE, RSVP sessions do not extend from host to host. They only take place between ingress and egress LSRs. Traffic from hosts connected to an ingress LSR is aggregated at the ingress LSR. The aggregated traffic, known as traffic trunk, is then mapped to an LSP, also know as LSP tunnel, see figure 6.19. Every LSR or LER only has to maintain states of the LSPs, not the per-flow state.

Ingress LSR Egress LSR

LSP Host A

Host B

Host C

Figure 6.19 Traffic Aggregation

RSVP-TE performs downstream label allocation, distribution, and binding on demand among LSRs in LSP path, thus establishing path state in networks, see figure 6.20.

New objects have been defined in the extensions to support traffic engineering in MPLS. Some of these new objects are: 1. LABEL_REQUEST object; 2. EXPLICIT _ROUTE object (ERO); 3. RECORD_ROUTE object (RRO); 4. SESSION_ATTRIBUTES object; 5. LABEL object; and 6. STYPE object. These objects are included in the PATH and/or RESV messages to setup an LSP tunnel (see figure 6.20). Other standard RSVP objects listed in table 6.1 can also be included in the PATH and RESV messages.


ure 6.20 Establishing an LSP Tunnel


PATH MESSAGE LABEL_REQUEST, ERO*, RRO*, SESSION_ATTRIBUTE*)

Fig

(SESSION,

RESV MESSAGE (SESSION, LABEL, STYLE, RRO*) * Optional Object

LSP Established

Details of objects in a PATH message to

support traffic engineering & Class of Service 1. LABEL_REQUEST object. This object

indicates that a label binding for a specific LSP. This object also contains the Layer 3 protocol ID (e.g. IP) that will traverse this LSP. This object can also indicate the label range for ATM-based and Frame Relay-based labels. This is no label range for regular 32-bit MPLS labels.

2. EXPLICIT_ROUTE object. This object is encoded as a series of sub-objects contained in the ERO. Each sub-object can identify a group of nodes in the explicit route or can specify an operation to be performed along the path. Each group of nodes is called an abstract node. The format of sub-object is shown in figure 6.21.

In figure 6.21: L bit = 0 implies strict hop in the explicit

route L bit = 1 implies loose hop in the explicit

route Type can be IPv4 prefix; IPv6 prefix; or Autonomous System Number (this number identifies an abstract node

L Type Length Sub-object Contents

Figure 6.21 ERO Sub-object Format


consisting of the set of nodes belonging to the autonomous system. This number allows an LSP to traverse across different autonomous systems.) Figure 6.22 show a loose explicit route with a strict hop and a loose hop.

e 6.22 Loose Explicit Route


LSP

10.10.10.1 10.10.10.20.1

10.10.30.1

10.10.40.1

Explicit Route = {[L=0, IPv4, 10.10.10.1] [L=1, IPv4, 10.10.40.1]}

Figur

3. RECORD_ROUTE Object This object is sent to the egress LSR via the PATH message and is returned to the ingress LSR via the RESV message. The ingress LSR can get information about the actual route that the LSP traverses from this object. RRO can be used to: a. detect L3 routing loops or routes

inherent in explicit route; b. collect detailed hop-by-hop path

information of a RSVP session. c. input into the ERO. After the

ingress LSR receives the RRO from the RESV messages, it can alter its ERO in the next PATH message. This can be used to pin-down (i.e. not allowed to change) a session path to prevent the path from being altered even if a better path becomes available.

When an RRO traverses the path in a PATH message, each node (including the ingress and egress LSRs) along the path will insert its IP address prefix sub-object into the RRO. When the RRO returns to the ingress LSR along the same path in the RESV message, each node along the


path will retrieve the RRO from the RESV message. Hence every node along the path will have the complete route of the LSP from ingress to egress.

4. SESSION_ATTRIBUTION Object This object is used to control LSP priority, preemption, and fast reroute features. The Setup Priority field defines the priority of this LSP. This field is used when deciding whether one LSP tunnel can preempt another. The Holding Priority field defines the priority of the LSP tunnel with respect to holding resources that other LSPs want to consume. This field is used when deciding whether one LSP tunnel can be preempted. The Local Repair flag is used to indicate if local repair with transit LSRs can violate the ERO in case of failure. The Ingress node may reroute bit is used to indicate if the ingress LSR can reroute the LSP without tearing it down.

5. SESSION, SENDER_TEMPLATE, FLOW_SPEC and FILTER_SPEC New C-type (Class-type) extensions have been defined for these regular RSVP objects. SESSION: the new C-type defined is LSP_TUNNEL_IPv4 which contains the IPv4 address of the egress node and a unique 16-bit LSP_ID (i.e. LSP tunnel/traffic trunk ID) that remains constant over the life of the LSP tunnel even if the LSP is rerouted. This object uniquely identifies an LSP tunnel. SENDER_TEMPLATE: the new C-type defined is LSP_TUNNEL_IPv4 which contains the IPv4 address for the sender node and a unique 16-bit LSP_ID that can be changed to allow a sender to share resources with itself. This LSP_ID is used when an LSP tunnel that was


established with a Share-Explicit style is rerouted. FLOW_SPEC: the new C-type defined is CLASS_OF_SERVICE (CoS). FLOW_SPEC is used to define the desired QoS, see table 6.1. When a traffic flow in a session satisfies the specifications (i.e. sender IP address and optionally sender IP port number) listed in the FILTER_SPEC object carried in the same PATH message with the FLOW_SPEC, then this flow will get the desired QoS treatments define in the FLOW_SPEC in each LSR along the path. Usually the ingress LSR will construct a TSPEC and inserts it into the FLOW_SPEC. Based on this information, the egress LSR will construct a receiver TSPEC and RSPEC (see table 6.1), inserts them into the FLOW_SPEC and sends it back to the ingress LSR via the RESV message. LSRs along the path will reserve resources based on the information in FLOW_SPEC. FILTER_SPEC: the new C-type defined is LSP_TUNNEL_IPv4 which contains the IPv4 address and optionally the IP port number for the sender node and a unique 16-bit LSP_ID that can be changed to allow a sender to share resources with itself. This LSP_ID is used when an LSP tunnel that was established with a Share-Explicit style is rerouted. FILTER_SPEC together with FLOW_SPEC form the FLOW DESCRIPTOR.

Figure 6.23 shows some of the objects that can be included in a PATH message.

PATH Message (SESSION, LABEL_REQUEST, ERO, RRO, SESSION_ATTRIBUTE, SENDER_TEMPLATE, FLOW_SPEC)

Figure 6.23 PATH Message


Details of objects in a RSVP message to support

traffic engineering & Class of Service 1. LABEL object: a label is provided for

each sender (LSR/LER) to the LSP. Figure 6.24 shows how an LSR processes the label in a RESV message. When the LSR receives a RESV message corresponding to a previous PATH message, it binds the incoming label for the specifying FEC/traffic trunk to the receiving interface (2 in this example) and updates the forwarding table. It then binds a locally allocated label to the LSPs incoming interface (1 in this example) and updates the forwarding table. The LSR then constructs a new LABEL object, replaces the old LABEL object in the received RESV message and forward this RESV message to previous hop (upstream) in the LSP.

Direction of LSP (i.e. packet flow)

Downstream LSR/LER

Upstream LSR/LER

1 2

MPSL Forwarding/Switching Table Input Interface Input Label Output Interface Output Label Action 1 11 2 22 Swap

LSR

RESV (LABEL = 22) RESV (LABEL = 11)

Figure 6.24 LSR Process the LABEL Object

2. STYLE object: this object specifies the resources reservation style that can be applied on traffic trunks (i.e. aggregated flow). In MPLS, reservation style can be either Fixed Filter (FF) or Shared Explicit (SE). It is the receiver, i.e. the egress LSR that chooses the reservation style for an LSP, NOT the ingress LSR. However, an ingress LSR can set the Ingress node may reroute bit in the SESSION_ATTRIBUTE object to


request that the egress LSR uses the SE reservation style. Fixed Filter: FF reservation style specifies an explicit list of senders and a distinct reservation for each of them. Each sender is identified by the IP address of an LSR/LER and a local LSP_ID. Each sender has distinct reservation and is not shared with other senders. A separate LSP is constructed for each sender-receiver pair. Traditional application for this style of reservation is video distribution which requires a separate pipe for each of the individual video streams. In figure 6.25, ingress LSR A and B create two separate point-to- point LSPs, LSP 1 and LSP 2, towards common egress LSR D both with FF reservation style. The total amount of bandwidth reserved on shared link C-D is equal to the sum of reservation required by ingress LSR A and ingress LSR B. Egress LSR D also different labels for LSP 1 and LSP 2. Shared Explicit: The SE reservation style creates a single reservation over a link that is shared by an explicit list of senders. Again, separate LSP is created for each sender-receiver pair. In figure 6.26, LSP 1 and LSP 2 are created with SE reservation style. Link C-D is the shared link with bandwidth reservation of the larger request. For the multipoint-to-point LSP shown in figure 6.26, egress LSR D assigns the same label to LSP 1

LSP 1

LSP 2

LSP 1

LSP 2 Ingress LSR B

Ingress LSR A

Egress LSR D LSR C

Figure 6.25 Fixed Filter Reservation Style


and LSP 2. This is known as label merging or stream merging.

LSP 1 LSP 2 LSP 1

LSP 2 Ingress LSR B

Ingress LSR A

Egress LSR D LSR C

Figure 6.26 Shared Explicit Reservation

Figure 6.27 shows some of the objects that can be included in an RESV message.

RESV Message (SESSION, LABEL, RRO, STYLE, FLOW_DESCRIPTOR list)

Figure 6.27 RESV Message

Establishing an LSP Tunnel using RSVP-TE

We use the example shown in figure 6.28 to explain how RSVP-TE establishes an LSP tunnel from LSR 1 to LSR 4. 1. Assume an explicit route has already

been constructed and downloaded to ingress LSR1, and the L-bit in each sub-object in the ERO of this explicit route has been cleared to specify strict hop.

2. LSR 1 creates a PATH message with objects as shown in figure 6.28. It: - sets up the SESSION object to uniquely

identify this LSP tunnel; - indicates in the LABEL_REQUEST

object that FEC-to-label binding is requested for the LSP and also identifies the L3 protocol (L3PID, e.g. IP) going to be carried by this LSP;

- inserts its own IP address prefix into the RRO object;

- sets up the priority, preemption and fast reroute in the SESSION_ATTRIBUTE object;

- enters the senders IP address (in RSVP-TE, this can be the IP address of the ingress LSR, LSR 1) and the IP port


number if necessary into the SENDER_TEMPLATE object

- enters the traffic characteristics (see table 6.1) of the flow that will be sent along the LSP into TSPEC in the FLOW_SPEC object. Note, if the LSP is intended to carry best-effort traffic and does not require that resources be allocated, then the burst size and the rate will be set to 0.

3. Ingress LSR 1 sends the PATH message to LSR 2 as specified by the ERO.

4. When LSR 2 receives the PATH message, it records the ERO, the session, the LABEL_REQUEST object, the session_attribute object, IP address of the previous hop, and the TSPEC. It then inserts its own IP address into the RRO and forwards the PATH message to LSR 3.

5. LSR 3 process the PATH message exactly the same as LSR 2 and forwards it to egress LSR 4. Note: LSR 3 is the so call penultimate LSR.

6. When LSR 4 receives the PATH message, it notices from the session object that it is the egress LSR for the LSP. It generates a RESV message for the session to distribute labels and establish forwarding state for the LSP tunnel. It: - allocates a label with a value of 0 and

places it in the LABEL object. Both 0 and 3 are of special meaning to the LSR 4. They are used to speed up the operation of the egress LSR by avoid doing two table lookups;

- constructs the STYLE object for the RESV message. It selects the appropriate reservation style. If the Ingress node may reroute bit in the SESSION_ATTRIBUTE object is set by LSR 1, then it may set the style to SE.


- based on the TPSCE in the PATH message, LSR 4 constructs an appropriate receiver TSPC and RSPEC for the FLOW_SPEC. Together with the SENDPER_SPEC, this forms the FLOW_DESCRIPTOR in the RESV message;

- sends the RESV message back to LSR 3 based on the previous hop information. Note: ERO is not in the RESV message.

7. When LSR 3 receives the RESV message containing the label assigned by LSR 4, it - stores the label 0 as part of the

reservation state for the LSP; - allocates a new label, 22, and replaces

the old label 0 with this new label in the LABEL object. This is the label that LSR 3 uses to identify incoming traffic on the LSP from LSR 2.

- updates its forwarding/switching table;

- allocates the resources and install filters based on the information in the STYLE object and the FLOW_DESCRIPTOR list; and

- forwards the RESV message upstream to LSR 2 based on the previous hop information it receives in the previous PATH message.

8. LSR 2 processes the RESV message exactly the same way as LSR 3 except the new label allocated is 11.

9. When LSR 1 receives the RESV message that contains the label 11 assigned by LSR 2, it allocates the resources, installs the filter(s), updates its forwarding table and uses label 11 for all outgoing traffic that maps to this LSP.


Figure 6.28 Setup an LSP Tunnel between Ingress LSR 1 and Egress LSR 4 Style

Ingress LSR 1

Egress LSR 4

LSR 2 LSR 3

PATH Message (SESSION, LABEL_REQUEST, ERO, RRO, SESSION_ATTRIBUTE, SENDER_TEMPLATE, FLOW_SPEC)

RESV Message (SESSION, LABEL, RRO, STYLE, FLOW_DESCRIPTOR list)

LABEL=0 LABEL = 22 LABEL =11

Packet Forwarding o The components for packet forwarding in LER and

LSR are shown in figure 6.6 and figure 6.4 respectively.

o We use the LSP established in figure 6.28 to explain packet forwarding in MPLS. With reference to figure 6.29: 1. When ingress LSR 1 receives a standard IP

packet, it analyses the IP header. Based on this analysis, the packet is classified, mapped to an FEC/traffic trunk and hence assigned a label (in this case 11). LSR 1 encapsulates the IP packet in an MPLS frame, pushes the label 11 onto the label header and forwards the MPLS packet out interface 2.

2. LSR 2 receives the MPLS packet on interface 1 with a label equal 11. It looks up the forwarding/switching table and learns that the packet should be forwarded out interface 2 with label equals 22. It swaps the label to 22 and forwards the MPLS packet out interface 2.

3. LSR 3 (the penultimate LSR) processes the received packet exactly the same way as LSR 2.

4. LSR 4 receives the MPLS packet on interface 1 with a label equal 0. Because the MPLS frame has a label value of 0, LSR 4 knows that it is the egress LSR for the LSP tunnel and that it must make a forwarding decision based on the destination IP address in the packets IP header, not based on the MPLS label. Therefore LSR 4 performs a standard IP routing by doing a


longest-match lookup in its IP routing table for the next hop.

Figure 6.29 Data Packet Forwarding

Ingress LSR 1

Egress LSR 4 LSR 2 LSR 3

Layer-2 Header Label = 11

PUSH

Label = 22 Label = 0

SWAP SWAP POP

MPLS Domain

IP Packet

2 1 2 1 2 1

MPLS Fast Re-Route

MPLS fast re-route allow an LSP tunnel to be rerouted in less than 50 msec.

Conditions that require to re-route an established LSP may include:

1. when any resource (e.g. link or router) along the LSP tunnel failed;

2. when the LSP does not meet QoS requirements; and 3. when the failed resources along the original path are restored

and are available, the previously re-routed LSP can re-route back to its original path.

The make before break approach is adopted in MPLS for re-

routing. Backup tunnels are usually pre-established and traffic is transferred to the backup tunnel before the primary tunnel is tear down.

SE reservation style in RSVP prevents double counting of

resources when the backup path and the primary path share common link(s)/hop(s).

There are two methods to use RSVP-TE to establish backup LSP

tunnels. They are: 1. end-to-end protection switching.

In this method, it is required to pre-establish two paths from ingress to egress, one primary and one backup, for an LSP for redundancy. If a link along the primary path fails, the


ingress node will be notified and it will switch all traffic to the pre-signaled backup path, see figure 6.30. When the backup path is idle, i.e. not carrying any traffic, its resources will be used by other LSPs. When re-route to backup path is triggered, then it will preempt other lower priority LSP(s) and reclaim the resources.

Link Failure Failure

Failure

Ingress LSR 1

Egress LSR 6

Backup Path

Figure 6.30 End-to-end Protection Switching

LSR 2 LSR 3

LSR 4 LSR 5 Backup Path Primary LSP

2. local repair. Local repair allows the LSP to be repaired at the place of failure. This allows the existing LSP to reroute around a local point of failure. This method allows the network to converge faster then method 1 above. For the one-to-one local repair backup scheme, at each LSR/LER along an LSP, a detour LSP is pre-signaled to protect this node against a failure of its downstream link or node. A detour LSP is a partial LSP that starts upstream of that node and intersects with the original LSP somewhere downstream of the point of link or node failure. Figure 6.31 show the detour LSP at LSR 2 that protects it from a failure of LSR 3 or the LSR2LSR3 link. The route of this detour LSP is LSR2LSR4-LSR5-LSR6.

MPLS VPNs There are a number of diverse VPN models. They are:

Link Failure

Ingress LSR 1

Egress LSR 6

Figure 6.31 Local-repair Protection Switching

Detour LSP

Point of Local RepairLSR 3 LSR 2

Active LSP

LSR 4 LSR 5


1. Traditional VPNs a) Frame Relay (Layer 2) b) ATM (Layer 2)

2. Customer Premises Equipment (CPE) based VPNs a) L2 Tunneling Protocol (L2TP), PPTP (Layer 2) b) IPSec (Layer 3)

3. Service Provider Provisioned VPNs a) BGP/MPLS VPNs - RFC 2547bis (Layer 3) b) MPLS-based Layer 2 VPNs.

We will discuss 3 a) and 3 b) here.

BGP/MPLS VPN - RFC 2547bis (Layer 3) RFC 2547bis

1. provides a mechanism that simplifies WAN operations for a diverse set of customers that have little IP routing experience/expertise; and

2. is a way to efficiently scale the network while delivering revenue generating value added services.

Network Components

1. RFC 2547bis defines a collection of policies to control the connectivity among a set of customer sites. A customer site is connected to the service provider MPLS network by one or more ports at the Provider Edge (PE) router where the service provider associates each port with a VPN routing and forwarding table (VRF) (see figure 6.32).

2. Customer Edge (CE) Device A CE device provides customer access to the service provider network over a data link (e.g. ATM PVC, Frame Relay PVC, VLAN) to one or more PE routers. Usually the CE device is a router (can be a L2 switch or a host) that establishes an adjacency with its directly connected PE routers. After the adjacency is established, the CE router advertises the sites local VPN routes to the PE router and learns remote VPN routes from the PE router.

3. PE routers PE routers exchange routing information with CE routers using static routing, RIPv2, OSPF, IS-IS or EBGP. A PE router only maintains VPN routes (i.e. VRF table) for those VPNs to which it is directly


attached (see figure 6.32), not ALL the service providers VPN routes. This enhance scalability. Again, a port on a PE router, not a customer site, is assoicated with a VRF. The customer connection to a port is indirectly mapped to the specific VRF associated with this port. A PE router can maintain multiple VRF that supports the per-VPN segregation of routing informatino. After learning local VPN routes from CE routers, a PE router exchanges VPN routing infromation with other PE routers using IBGP. PE routers can maintain IBGP sessions to route reflectors as an alternative to a full mesh of IBGP session. This enhance scalability. When using an LSP to forward VPN data traffic, the ingress PE routers functions as the ingress LSR and the egress PE router functions as the egress LSR.

4. Provider (P) Routers A P router is any router in the providers network that does not attach to CE devices. P routers are the MPLS core LSRs. VPN data traffic is forwarded across the MPLS backbone using a TWO layers label stack. P routers are not required to maintain specific VPN routing information for each customer site. This greatly enhances the scalability.

PE

P

P

P

P

Service Provider MPLS Backbone

Operation Issues and Solutions Besides scaling, RFC 2547bis also provides solutions for the following operational issues:

Customer A Site 1

CE

CE

CE

CE

Figure 6.32 RFC 2547bis Network Components

Customer A Site 2

Customer B Site 2

VRF-A

VRF-B VRF-B

VRF-A

PE

Customer A Site 3

VRF-A CE

PE

10.11/16

Customer B Site 1

10.11/16

10.21/16

10.21/16

10.31/16


1. support overlapping customer address space; 2. constrain network connectivity; 3. maintain updated VPN routing information; and 4. conserve backbone bandwidth and PE router packet

processing resources o Overlapping Customer Address Spaces

Figure 6.32 shows the overlapping of customer As private IP address space with customer Bs private IP address space. To solve this problem and to provide globally unique addresses, each customers IP address is prefixed with an 8-byte Route Distinguisher (RD), see figure 6.33. The addresses so formed are called VPN-IPv4 addresses. Note: The route distinguisher by itself should be globally unique. In figure 6.33, the TYPE field can be either 0 or 1. For TYPE 0, the ADMINISTRATOR subfield is 2 bytes and should hold a globally unique autonomous system number (ASN), i.e. the service provider shall use the ASN assigned to it in this field. The ASSIGNED NUMBER subfield holds a value from the numbering space administrated by the service provider. For type 1, the ADMINISTRATOR subfield is 4 bytes and should hold a globally unique IPv4 address, e.g. can use the global loop back address of the PE that originates the route, i.e. the egress PE for an LSP. Again, the ASSIGNED NUMBER subfield holds a value from the numbering space administrated by the service provider. When configuring RDs on PE routers, each VRF in each PE within a VPN can have its own unique RD. The VPN-IPv4 addresses in each PE will be distributed to other PE routers within the VPN by Multi-Protocol BGP (MP-BGP). Hence, the use of unique RDs can: 1. create distinct routes to a common IPv4 prefix;

TYPE ADMINISTRATOR ASSIGNED NUMBER IPv4 ADDRESS PREFIX

2-byte 6-byte 4-byte

Route Distinguisher

Figure 6.33


2. use policy to decide which packets use which route.

Note: 1. VPN-IPv4 addresses are used only within the

service provider network; 2. VPN customers are not aware of the use of

VPN-IPv4 addresses; 3. VPN-IPv4 addresses (i.e. routes) are carried

only in routing protocol MP-BGP that runs across the service provider network; and

4. VPN-IPv4 addresses are not carried in the packet headers of VPN data traffic as it crosses the service provider network.

MP-BGP (RFC 2858)

Conventional BGP4 carries routing information only for IPv4 address family. IETF standardizes the Multi-protocol extensions for BGP4. These extensions allow BGP4 to carry routing information for multiple network layer protocols, e.g. IPv6, IPX, VPN-IPv4, etc. Therefore, every PE router in the service provider network has to support MP-BGP so as to support RFC 2547bis VPN.

o Constrain Network Connectivity If the route to a specific network is not installed in a PE routers VRF, the network is then considered to be unreachable from that PE router. Hence, service providers can constrain the flow of customer VPN data traffic by constraining the flow of VPN-IPv4 routing information. The MP-BGP/MPLS VPN constrains the flow of VPN-IPv4 routing information by: 1. using multiple VRF tables; and 2. using BGP extended community attributes. Using Multiple VRF Tables Each PE router maintains one or more per-site VRFs with each VRF configured to associate with one or more ports which connect directly to customer sites, see figure 6.32. When receiving an outbound customer data packet from a directly attached CE


router, the PE router performs a route lookup in the VRF that is associated with that site. The specific VRF used is determined by the port over which the data packet is received. Support of multiple VRF tables makes it easy for the PE router to provide the per-VPN segregation of routing information. Figure 6.34 shows how PE 1 populated VRF-A: 1. PE 1 learns customer A site 1s VPN A routes

from CE 1 and imports them into VRF-A; 2. Remote routes are learned via MP-IBGP from

PE 2 and PE 3 that are directly connected to sites with hosts that are members of VPN A (see figure 6.34). Based on the BGP extended community route target attributes (to be discussed later), PE 1 may import remote routes learned for VPN A into VRF-A.

3. PE 1 does not import local routes from CE 5 and remote routes from CE 3 into VRF-A because they are routes for VPN B.

Customer A Site 1

Customer B Site 2

Customer A Site 2

PE 1

P

P

P

P

Service Provider MPLS Backbone

CE 1

CE 5

CE 2

CE 3

Figure 6.34 A PE Router Populate s a VRF Table

VRF-A

VRF-B VRF-B

VRF-A

PE 2

Customer A Site 3

VRF-A CE 4

PE 3

10.11/16

Customer B Site 1

10.11/16

10.21/16

10.21/16

10.31/16

Local Routes

Remote Routes

Remote Route

Using BGP extended community attributes Extended community attributes carried in BGP messages as attributes of the route are used to control the distribution of routing information between PE routers. These attributes identify the route as belonging to a specific collection of routes, all of which are treated the same with respect to routing


policy. Each BGP extended community attribute is 32 bits long, globally unique (e.g. contains either the providers global ASN or a global IP address) and can be used by only one VPN. However, a customer VPN can use multiple BGP extended communities. RFC 2547bis VPNs can use up to 3 different types of BGP extended community attributes: 1. The route target attribute identifies a collection

of sites (VRFs) to which a PE router distributes routes. A PE router uses this attribute to constrain the import of remote routes into its VRF.

2. The VPN-of-origin attribute identifies a collection of sites and establishes the associated route as coming from one of the sites in that set.

3. The site-of-origin attribute identifies the specific site from which a PE router learns a route. It is encoded as a route extended community attribute, which can be used to prevent routing loops.

Using the route target attribute

Before distributing local routes to other PE routers, the ingress PE router attaches a route target attribute to each route learned from directly connected sites. The route target attached to the route is based on the value of the VRFs configured export target policy. An ingress PE router can be configured to assign a single route target attribute to all routes or a set of routes learned from a given site. Besides, the directly connected CE router can specify one or more route targets for each route. Before importing remote routes that have been distributed by another PE router, each VRF on an egress PE router is configured with an import target policy. A PE router can only import VPN-IPv4 route into a VRF if the route target carried with the received route matches one of the PE router VRFs import target. By careful configuration of export target and import target policies, service providers can construct different types of VPN topologies.


Example: Hub-and-spoke VPN Topology Assume that Customer A wants its BGP/MPLS VPN service provider to create a VPN that supports hub-and-spoke site connectivity, see figure 6.35. The inter site connectivity for Customer A can be described by the following policies. 1. Customer A site 1 can communication

directly with Customer A site 3 but not directly with Customer A site 2. If Customer A site 1 wants to communicate with Customer A site 2, it must sends data traffic by way of Customer A site 3.

2. Customer A site 2 can communication directly with Customer A site 3 but not directly with Customer A site 1. If Customer A site 2 wants to communicate with Customer A site 1, it must sends data traffic by way of Customer A site 3.

3. Customer A site 3 can communicate directly with Customer A site 1 and site 2.

4. Customer A sites cannot send traffic to or receive data traffic from other sites belonging to other corporations.

With reference to figure 6.35, a hub-and-spoke topology is created using 2 globally unique route target values: Hub and Spoke. The VRF-A on PE 3 router (the hub site) is configured with an export target = Hub and an import target = Spoke. With this configuration, VRF-A on PE 3 router distributes all the routes in its VRF with a Hub attribute that causes the routes to be imported by the spoke sites (PE 1 and PE 2). VRF-A on PE 3 router imports all remote routes with a Spoke attribute. Both VRF-As on PE 1 router and PE 2 router are configured with an export target = Spoke and an import target = Hub. These two VRF-As distribute their routes with a Spoke attribute and import routes with only a Hub attribute.


Customer A Site 1

Customer B Site 2

Customer A Site 2

PE 1

CE 1

CE 5

CE 2

CE 3

Figure 6.35 Hub-and-spoke VPN Connectivity

VRF-B VRF-B PE 2

Customer A Site 3

CE 4

PE 3

10.11/16

Customer B Site 1

10.11/16

10.21/16

10.21/16

10.31/16

Exp Target = Hub Imp Target = Spoke

Exp Target = Spoke Imp Target = Hub VRF-A

Exp Target = Spoke Imp Target = Hub VRF-A

Service Provider Network

VRF-A

LSP

o Maintain updated VPN routing information

When the configuration of a PE router is changed by creating a new VRF or by adding one or more new import target policies to an existing VRF, the existing PE router might need to obtain VPN-IPv4 routes that it previously discarded. However, conventional BGP4 is a stateful protocol and does not support re-advertisement of routes. That is, once BGP peers synchronize their VRF tables, they do not exchange routing information until there is a change in their routing information. The route refreshment capability supported by MP-BGP provides a solution to this problem. Whenever the configuration of a PE router is changed, the PE router sends a route refresh message to its peers or the route reflector to trigger the re-transmission of routing information from its MP-BGP peers to obtain routing information it previously discarded.

o Conserve backbone bandwidth and PE router packet processing resources The generation, transmission and processing of routing updates consumes backbone bandwidth and


router packet processing resources. These assets can be conserved by eliminating the transmission of unnecessary routing updates. The number of BGP routing updates can be reduced by enabling the new BGP cooperative route filtering capability. During the establishment of the MP-IBGP session, a BGP speaker that wants to send or receive outbound route filters (ORFs) to or from its peer or route reflector advertises the cooperative route filtering capability using a BGP capabilities advertisement. The BGP speaker sends its peer a set of ORFs that are expressed in terms of BGP communities. The ORF entries are carried in BGP route refresh messages. The peer applies the received ORFs in addition to its locally configured export target policy, to constrain and filter outbound routing updates to the BGP speaker.

Operation Model

There are two fundamental traffic flows occur in a BGP/MPLS VPN 1. A control flow that is used in VPN route distribution

and LSP establishment. VPN route distribution includes the exchange of routing information between the CE and PE routers, and between the PE routers across the providers network. LSP establish includes the RSVP-TE signaling messages exchanges between the PE and P routers across the providers network.

2. A data flow that is used to forward customer data traffic.

o We will use the example shown in figure 6.36 to

explain the BGP/MPLS VPN (RFC 2547bis) service provided by the service provider to customer B.

Exchange of Routing Information PE 1 is configured with VRF-B with a globally unique route distinguisher, a globally unique export and import target Cust-B, and is associated with the port over which it learns routes from CE 5. When CE5 advertises the route with prefix 10.11/16 to PE 1, PE 1


installs a local route to 10.11/16 in VRF-B. PE 1 advertises: 1. the VPN-IPv4 address for 10.11/16 together

with the BGP extended community attribute route target, Cust-B; and

2. the selected MPLS label, e.g. 789, which identifies VRF-Bs association with the port connecting to CE5, and the loop-back address of PE 1 as the BGP next hop for the route 10.11/16;

to PE 2. When PE 2 receives PE 1s route advertisement, it determines if it should install the route to prefix 10.11/16 into VRF-B by performing route filtering based on the BGP extended community attribute Cust-B with its import target Cust-B. In this case, PE 2 installs the route to prefix 10.11/16 into its VRF-B and then advertises the route to prefix 10.11/16 to CE 3. LSP Establishment One or more LSPs have to be setup from PE 2 to PE 1 for Customer B site 2 to send data to Customer B site 1. RSVP-TE is usually used to setup these LSPs. However, if best-effort LSPs are desired, then LDP is used. Data Flow Assume host 10.21.1.2 at Customer B site 2 wants to communicate with server 10.11.2.3 at Customer B site 1. Host 10.21.1.2 forwards all data packets for server 10.11.2.3 to default gateway. When a packet arrives at CE 3, it performances a longest match route lookup and forwards the packet to PE 2. When PE 2 receives this packet, it performs a route lookup in VRF-B and obtains the following information: 1. The MPLS label, 789, that was advertised by

PE 1 with the route; 2. The BGP next hop for the route (the loop back

address of PE 1); 3. The outgoing port for the LSP from PE 2 to PE

1; and 4. The initial MPLS label for the LSP from PE 2

to PE 1.


User traffic is forwarded from PE 2 to PE 1 using MPLS with a 2-layer label stack. PE 2 first pushes the label 789 onto the label stack making it the bottom label. Then it pushes the label associated with the LSP from PE 2 to PE 1 onto the label stack making it the top label. Then PE 2 forwards the MPLS frame out onto the output port. Assume the label at the penultimate LSR to PE 1 for this LSP is 3. Then this penultimate LSR pops out the top label, 3, and forwards the MPLS frame to PE 1. When PE 1 receives the packet, it pops the bottom label, 789, and uses it to identify the directly attached CE that is the next hop to 10.11/16. Finally PE 1 forwards the packet to CE 5, which forwards the packet to server 10.11.2.3.

Some Benefits of BGP/MPLS VPNs 1. BGP/MPLS VPN allows service providers to offer

scalable, revenue generating value-added services. 2. There are no constraints on the address plan used by

each VPN customer. 3. Customers do not have to deal with inter-site routing

issues because they are the responsibility of the service provider.

4. Providers do not have a separate backbone or virtual backbone to administer for each customer VPN.

5. The policies that determine whether a specific site is a member of a particular VPN are the policies of the

Customer A Site 1

PE 1

Customer B Site 2

CE 1

CE 5

Fi

CE 2

CE 3

gure 6.36 Customer B V

Customer A Site 2

PN

Customer A Site 3

CE 4

PE 3

10.11/16

Customer B Site 1

10.11/16

10.31/16

PE 2 10.21/16

10.21/16

Exp Target = Cust-B Imp Target = Cust-B

VRF-A

Exp Target = Cust-B Imp Target = Cust-B

VRF-A

Service Provider Network

VRF-B VRF-B

VRF-A LSP


customer. Customer policies are to be implemented by the service provider alone.

6. A VPN can span multiple service providers. 7. Service providers can use a common infrastructure to

deliver both VPN and Internet connectivity services. 8. Flexible and scalable QoS for customer. VPN service

can be supported through the use of the experimental bits in the MPLS shim header or by the use of traffic engineered LSPs.

9. RFC 2547bis is link layer independent.

MPLS-based Layer 2 VPNs Draft-Martini and Virtual Private LAN Service (VPLN) BGP/MPLS VPN mitigates most of the scalability issues and also eliminates the BGP stress on CEs because they do not exchange routing information directly with each other. However, it is considered to be an overkill solution to provide strict layer 2 (L2) transportation services, e.g. to provide Ethernet L2 services to forward Ethernet frames across an IP/MPLS network between customer sites. In the following sections, we will focus on the mechanisms in IP/MPLS to provide L2 Ethernet services, namely: 1. Point-to-Point (P2P) Ethernet Service delivered via draft-

martini over an MPLS network (also known as Ethernet over MPLS (EoMPLS)) or via L2TPv3 over an IP network;

2. Multipoint-to-Multipoint (MP2MP) Ethernet Service delivered via VPLN

The Pseudo-wire (PW) Concept

PW is the packet leased line concept standardized by IETF. An Ethernet PW allows Ethernet frames, not including preamble and FCS, to be carried over a packet switched network, e.g. an IP/MPLS network. An Ethernet PW emulates a single Ethernet link between exactly two endpoints. The PW terminates a logical port within the PE. This port provides an Ethernet MAC service that delivers each Ethernet frame that is received at the logical port to the logical port in the corresponding PE at the other end of the PW. An Ethernet PW can be configured manually or setup using signaling protocol like BGP or LDP. In figure 6.37, a big PSN tunnel is used to aggregate multiple PWs across a PSN network. The PSN tunnel can be created using generic routing protocol (GRE), L2TP or MPLS. This


tunnel is used to shield the internals of the network, i.e. P1 and P2, from information relating to the service provided by the PE1 and PE2. While PE1 and PE2 are involved in creating the PWs and mapping the L2 service to the PWs, P1 and P2 are agnostic to the L2 service and are passing either IP or MPLS packets from one edge to another.

CE 1 CE 2

eference Model Adopted by IETF to Support the Ethernet Pseudo-wires Emulated Services Figure 6.37 R

Packet Switched Network (PSN) Tunnel

PE 1 PE 2 P 1 P2

PW1 PW2

Native Ethernet or VLAN service

Draft-Martini (P2P) - EoMPLS Draft-martini is an IETF L2 encapsulation method for carrying Ethernet, Frame Relay and ATM traffic across an MPLS network. With draft-martini encapsulation, a PW is constructed by building a pair of unidirectional MPLS virtual connection (VC) LSPs between two PE endpoints. One VC-LSP is for outgoing traffic, and the other is for incoming traffic. EoMPLS uses targeted LDP that allows the LDP session to be established between the ingress and egress PEs irrespective of whether the PEs are adjacent (directly connected) or nonadjacent (not directly connected). o Ethernet Encapsulation

For a PW to carry an Ethernet frame (without the preamble and FCS), it can be configured as one of the following: 1. Raw mode. In raw mode, the assumption is that

the PW represents a virtual connection between two Ethernet ports. What goes in on the ingress side goes out on the egress side.

2. Tagged mode. In tagged mode, the assumption is that the PW represents a connection between two VLANs. Each VLAN is represented by a different PW.


Figure 6.38 shows the establishment of both raw mode and tagged mode PWs between the PE 1 router and the PE 2 router.

CE 2

Tagged Mode PWs

PE 1 PE 2 P 1 P2

VLAN20 VLAN10

o Maximum Transmit Unit (MTU) Both ends of a PW must agree on their MTU size to be transported over the MPLS network. The P routers shall be able to support the largest size.

o PWs must be able to support frame re-ordering in order to deliver the frames in sequence.

o Using LDP and MPLS LSP to Setup a PW We will use the example shown in figure 6.40 to explain the setup of a PW. Steps: 1. A targeted LDP session is formed between PE 1

LSR and PE 2 LSR. 2. PE 1 and PE 2 exchange VC information, i.e.

service information. This is achieved by carrying VC information in a label mapping message sent in downstream unsolicited mode with a new type of forwarding equivalency class element as shown in figure 6.39. a) PW or VC Type - a value that represents

whether the VC is of type Frame Relay DLCI, ATM cell, PPP, Ethernet tagged or untagged frames, Circuit Emulation, and so on. This field indicates the service provided.

Figure 6.38 Martini Tunnel Modes

CE 1

Raw PW

CE 4

Raw Mode

CE 3

VLAN10

VLAN20 Raw Mode


b) PW or VC ID a connection ID that together with the PW (VC) type identifies a particular PW (VC).

c) Group ID represent a group of PWs. For example, all the PWs carried by the same Ethernet port can belong to the same group. The Group ID is intended to be used as a port index or a virtual index.

d) Interface Parameters a field that is used to provide interface specific parameters, such as the interface MTU.

Assume the VC label PE 2 gives to PE 1 is 201 and the VC label PE 1 gives to PE 2 is 102.

3. MPLS RSVP-TE is used to setup the two opposite direction LSPs connecting PE 1 LSR and PE 2 LSR. Assume the LSP label used at the penultimate LSR for both LSPs is 3.

Up to step 3, the two VC-LSPs are established and the PW is considered operational. The following steps show how an Ethernet frame is forwarded across the MPLS network from CE1 to CE 2.

4. When PE 1 receives an Ethernet frame, it rips

off the preamble and FCS fields, pushes the VC label (201 in this case) and then the LSP tunnel label (41 in this case) onto its label stack (i.e. the shim header).

5. P 1 LSR and P 2 LSR use the upper LSP tunnel label to switch the packet towards PE 2. P 1 and P 2 do not have visibility to the VC label.

6. P 2 is the penultimate LSR for PE 2. It pops the LSP tunnel label (3 in this case) and forwards the packet to PE 2.

PW TYPE (VC TYPW)

PW ID (VC ID)

Group ID

Interface Parameters

Figure 6.39 LDP Forwarding Equivalency Class


7. PE 2, the egress LSR, receives the packet with the inner VC label 201 that indicates to PE 2 how to process this packet. In general, for raw mode Ethernet service, all PE 2 has to do is to forward the packet to CE 2. For tagged mode Ethernet service or other services, PE 2 may have to carry out more complicate processing.

CE 1 CE 2

VC-LSP

PE 1 PE 2 P 1 P 2

PSN Tunnel LSP

Native Ethernet or VLAN Service

Ethernet VC Label = 201

LSP Label = 41

VC Label = 201

LSP Label = 51

VC Label = 201 Frame

Figure 6.40 LDP Session between PEs

VPLS (MP2MP) VPLS emulates a LAN that provides full learning and switching capabilities. Learning and switching are done by allowing PE routers to forward Ethernet frames (without preamble and FCS) based on learning the MAC address of end stations that belong to the VPLS. VPLS allows an enterprise customer to be in full control of its WAN routing policies by running the routing service transparently over a public IP/MPLS network. VPLS services are transparent to higher layer protocols and use L2 emulated LANs to transport any type of traffic such as IPv4, IPv6 and IPX. With VPLS, the CEs are connected to VPLS-enabled PEs. The PEs can participate in one or more VPLSs/VPLS domains. For example PE 1 in figure 6.41 participates in VPLS 1 and VPLS 2. To the CEs, a VPLS domain looks like an Ethernet switch, and the CEs can exchange information with each other as if they were connected via a LAN. Separate L2 broadcast domains are maintained on a per-VPLS basis by PEs. Such domains are then mapped into tunnels in the service provider network.


Figure 6.41 shows a typical VPLS reference model. LSPs are created between PEs. These LSP tunnels can be shared

bn chapter 6

Documents