ece6609 10. ip qos service the current internet: ip protocol n best-effort service – no quality of...
TRANSCRIPT
ECE6609
10. IP QoS ServiceThe current Internet: IP Protocol Best-Effort Service – no Quality of
Service (QoS) guarantees are provided.
Connectionless Service – no connection is established prior to sending the packets. Each packet carries the full destination address. Routing is performed using a shortest path algorithm, independently for each packet.
ECE6609
Why do we need a New Protocol?
The emerging multimedia applications require QoS guarantees
Real-time applications require connection oriented services.
Other routing algorithms may be more appropriate than the shortest path algorithm in order to increase network efficiency and provide QoS.
ECE6609
Integrated Services (IntServ) Resource Reservation Protocol (RSVP)
– Disadvantage: Scalability (per-flow reservations) Differentiated Services (DiffServ)
– Disadvantage: No per-flow QoS guarantee Multiprotocol Label Switching (MPLS)
http://www.ietf.org*IETF – Internet Engineering Task Force
IETF* Proposed Solutions
ECE6609
A network is characterized as having EDGE and CORE ROUTERS.
Edge routers accept customer traffic, i.e., packets from any source outside the network into the network.
Core routers provide transit packet forwarding service between other Core routers and/or Edge routers.
REQUIREMENTS for IP QoS
ECE6609
Edge routers characterize, police, and mark customer traffic being admitted to the network.
Edge routers may decline requests signaled by outside sources (Admission Control).
Core routers differentiate traffic insofar as necessary to cope with transient congestion within the network itself.
Statistical multiplexing must be utilized wherever appropriate to maximize utilizaton of core resources.
REQUIREMENTS for IP QoS
ECE6609
Network Architecture
ECE6609
Integrated Services (IntServ)
GOAL: Augment existing Best Effort Internet with a range of end-to-end services for real-time streaming in interactive applications.
IntServ developed an architecture requiring per-flow traffic handling at every hop along an application’s
end-to-end path and explicit a priori signaling using RSVP (Resource Reservation Protocol) of each flow’s requirements.
ECE6609
Integrated Services (IntServ)
IntServ model requires resources such as bandwidth and buffers to be explicitly reserved for a given data flow to ensure that the application receives its requested QoS.
A flow is composed by a stream of packets with
the same source and destination addresses and port numbers.
A flow descriptor is used to describe the traffic and QoS requirements of a flow.
ECE6609
Per-flow QoS guarantees are provided at the expense of installing and maintaining flow-specific state in each router along the flow’s path.
Basic components of the IntServ architecture:
Setup Protocol, Traffic Control (filterspec),
flowspec and Traffic Classes.
ECE6609
Architecture Basic Components
Setup Protocol – enables a host or an application to request a specific amount of resources from the network realized by(Resource Reservation Protocol (RSVP))
Traffic Control (filterspec) – includes packet classifier, packet scheduler, and admission control.
flowspec – objects such as token bucket parameters.
Traffic Classes – best-effort, controlled load, and guaranteed services.
ECE6609
Setup Protocol: RSVP
Every application is presumed to use some form of signaling to negotiate service with an IntServ capable network.
IntServ signaling has 2 functions: Negotiation: When the network decides whether it can support the applications requested service (Admission Control)
Configuration: When the network configures the routers along the path to support the negotiated flow characteristics.
The applications useRSVP: Resource Reservation Protocol.
ECE6609
Must support both unicast and multicast traffic flows (i.e., RSVP sessions).
Must allow parties of a multicast session to
request different levels of QoS.
Must be deployable on top of existing IP infrastructure.
Goals for the Design of RSVP
ECE6609
Performs resource reservations for unicast and multicast
applications. Requests resources in one direction from a sender to a receiver
(simplex resource reservation) Requires the receiver to initiate and maintain the resource
reservation. Maintains soft statesoft state at each intermediate router: A resource
reservation at a router is maintained for a limited time only, and so the sender must periodically refresh its reservation.
Does not require each router to be RSVP capable. Non-RSVP capable routers use Best Effort delivery technique. Provides different reservation styles so that requests may be
merged in several ways according to the applications. Supports both IPv4 and IPv6.
Basics of RSVP
ECE6609
RSVP: Receiver Initiated Reservation
Similar to “Leaf Join Case” in ATM Multicasting. Motivation: RSVP is primarily designed to support multiparty
conferencing with heterogeneous receivers. In this environment the receiver actually knows how much
bandwidth it needs. If the sender were to make the reservation request, then the
sender must obtain the bandwidth requirement from each receiver. This may cause an implosion problem for large multicast groups.
Problem: Receiver does not directly know the path taken by data packets.
Solution: Use Path messages.
ECE6609
RSVP The application source transmits a “Path” message along the routed path to the unicast or multicast destination.
– The Path message has two purposes: * to mark the routed path in each router (store the “path state”) between sender/receiver and * to collect information about the QoS viability of each router along that path.
– Upon receiving the Path message, the destination host(s) can determine what services the network can support (e.g., guaranteed service or controlled load) and then generate an RSVP reservation (Resv) message.
ECE6609
Resv messages are sent back towards the sender along the reverse path.
The Resv message carries reservation requests to the routers along the path.
The Resv message contains traffic and QoS objects that are processed by the traffic control component of each router as it follows the reverse path upstream toward the sender.
If the router has sufficient capacity, then resources along the path back towards the receiver are reserved for that flow. If resources are not available, RSVP error messages are generated and returned to the receiver.
RSVP
ECE6609
RSVP Path and Resv messages are periodically sent by senders and receivers, respectively, to refresh the reservations performed.
When a state is not refreshed within a certain time out, the state is deleted.
The type of state that is maintained by a timer is called “Soft State” as opposed to hard state where the establishment and teardown of a state are explicitly controlled by signaling messages.
SOFT STATE in RSVP
ECE6609
RESERVATION STYLES in RSVP
Wildcard Filter Reservation A single reservation shared by all senders. Kind of shared pipe
whose resource is the largest of the resource requests from all receivers, independent of the number of senders. (e.g., Audioconferencing).
Fixed Filter Reservation A distinct reservation is created for each sender. S_i is the
selected sender and Q_i is the resource request for sender i. The total reservation on a link for a given session is the sum of all Q_i’s.
Shared Explicit Reservation A single reservation shared by a set of explicit senders where
S_i is the selected sender and Q is the flowspec.
ECE6609
flowspec and filterspec
flowspec is used to set parameters in the router’s packet scheduler.
flowspec (Flow Specification) consists of traffic specification (Tspec) (T for traffic) and a service request specification (Rspec) (R for reserve).
Tspec describes the sender’s traffic characteristics, i.e., it specifies the traffic behavior of the flow in terms of a token bucket.
ECE6609
Rspec reserves a service class which defines the requested QoS,
i.e., it specifies the requested QoS in terms of bandwidth, packet delay or packet loss.
flowspec is carried by RSVP messages into the network and defines the application’s QoS requirements as a series of objects, such as token bucket parameters.
Flow Specification (flowspec)
ECE6609
Traffic Control Components (filterspec)
Classifier - examines the source and destination addresses, and port number fields in each packet to determine what class the packet belongs to.
Scheduler - determines which packet will be served next.
Admission Control - determines whether a new flow can be granted the requested QoS without affecting other flows existing in the network.
filterspec (Filter Specification) provides the information
required by the packet classifier to identify the packets
that belong to the flow.
ECE6609
Traffic Classes Components
Best-Effort - same as in the traditional IP networks.
Controlled Load - approximates a best-effort over an uncongested network.
Guaranteed Service - supports real-time traffic flows that require a delay bound.
ECE6609
Controlled Load Service
Under CL service, the packets of a given flow will experience loss and delays comparable to a network with a light traffic load, assuming the flow complies with the traffic contract.
No guarantees are provided but both loss probability and delay are expected to be very low.
The application provides the network with an estimate of the traffic it will generate.
This estimate is done by specifying the data flow’s desired traffic parameters (Tspec) to the network element.
ECE6609
Controlled Load Service
Tspec (Traffic Specification) Model:
It is a refinement of the Token Bucket model. A source characterizes itself with the following SENDER-Tspec (traffic characteristics)
parameters: * Token bucket rate r (bytes/sec) and size b (bytes) * Peak data rate p * Minimum policed unit m * Maximum packet size M
ECE6609
Controlled Load Service
Admission Control is performed in order to deliver the expected QoS.
Traffic flows are policed. Non-conformant packets are either dropped or delivered
when possible using the best-effort service. Packets larger than the agreed maximum packet size will
also be considered as non-conformant. Adaptive real-time applications are supposed to use the
controlled load service. These applications perform well when the network is not
heavily loaded, but suffer rapid degradation in performance as the network load increases.
ECE6609
Guaranteed Service
GS guarantees the packets will arrive within a certain delivery time, and that they will not be discarded due to queue overflow, provided that the flow’s traffic complies with the traffic contract.
GS also uses the Tspec model. The service is requested by a sender
specifying Tspec and the receiver subsequently requesting a desired service level (Rspec).
ECE6609
Guaranteed ServiceRspec (Reservation Specification) Model: Works together with the Tspec model to guarantee a desired service level. The desired service level is described using the following parameters (R data rate and S slack term) in addition to r,b,p,m and M used for CL service:
– Data rate R is measured in the same units as r and must be equal to or more than r (token rate). R reflects the theoretical service rate that, at each router, will result in a desirable delay bound.
– Slack term S is measured in microsec and reflects how far each router is allowed to deviate from the ideal delay bound, i.e., the difference between the desired delay and
the delay obtained by using a reservation level R.
REMARK: Larger values for R and smaller values for S represent stricter delay bounds.
ECE6609
Guaranteed Service
Making use of TSpec and RSpec, a certain amount of bandwidth and buffer space is allocated at each node for each flow.
Resources are allocated using worst-case analysis.
Upper bounds for the end-to-end delay
and the packet loss probability can be evaluated mathematically.
ECE6609
SIGNALING and ADMISSION CONTROL
Sources emit regular PATH messages downstream
toward the receiver(s) for reservation
Two message objects relevant to IntServ are carried in PATH messages: SENDER_Tspec (describing the traffic) and ADspec (modified at each hop to reflect the network characteristics between source and receiver).
ADspec informs the receiver which service classes (CL, GS or both) are appropriate for the traffic.
Along the way, IntServ capable routers may modify the ADspec relevant to reflect restrictions or modifications required by the network.
ECE6609
SIGNALING and ADMISSION CONTROL
Receiver(s) respond with Resv messages upstream toward the sender
Receiver uses the SENDER_Tspec and (possibly modified) ADspec to determine which parameters to send back upstream in a flowspec element.
flowspec selects either CL or GS and carries parameters required by the routers along the upstream path to determine whether the request can be honored or not.
One message object relevant to IntServ is carried
in Resv messages: flowspec (describing the receiver’s desired QoS service to be applied to the sources’ traffic).
ECE6609
IntServ Drawbacks
Scalability – per flow resources reservation. Flexibility – IntServ provides a small number
of pre-specified traffic classes: Guaranteed and Controlled Load Services.
Efficiency – The Guaranteed Service of the IntServ model is based on the worst case analysis and thus, is very conservative. Moreover, bandwidth and delay requirements are coupled, causing network inefficiency.
ECE6609
Resource Reservation Protocol Drawbacks
Complicated RSVP signaling (unidirectional, frequent refresh messages).
The current version of RSVP lacks both adequate security mechanisms to prevent unauthorized parties from instigating theft-of-service attacks, and policy control.
ECE6609
Looking for a New Solution…
Because of the difficulty in implementing and deploying IntServ and RSVP, the IETF proposed the Differentiated Services (DiffServ) architecture
ECE6609
Differentiated Services (DiffServ)
Solves scalability and flexibility problems Forces as much complexity as possible to the edge
nodes which process lower volumes of traffic and lesser number of flows.
Offers service per aggregate traffic, rather than per flow.
Reservations are made for a set of related flows. It does not require new applications or extensive
router upgrades. It does not define specific services or service classes,
as IntServ does.
ECE6609
Differentiated Services
The objective of the DiffServ is to propose a small, well-defined set of building blocks from whicha variety of services may be constructed.
Complexity is moved from the core of the network to the edge of the network.
Packet forwarding in the core network is simple and per-aggregate rather than per-flow.
ECE6609
Differentiated Services
A DiffServ Domain is a set of contiguous DS nodes defining the same per hop behaviors (PHBs) and under the same policy strategy.
A DS domain consists of DS interior, edge, and boundary nodes. A boundary node interconnects the DS domain to other DS or
non-DS-complaint nodes. Edge and interior nodes only connect to other interior, edge, or
boundary nodes within the same DS domain.
ECE6609
Differentiated Services
The DS byte is used to specify the forwarding treatment (or per-hop behavior) to be used for a packet.
The DSCP (DiffServ Code Point) byte is used to
specify the forwarding treatment (or per-hop
behavior) to be used for packets
The DS byte coincides with the TOS octet in IPv4 and the Traffic Class octet in IPv6.
ECE6609
Edge and Core Nodes
Edge nodes handle a relatively small number of traffic flows.
Therefore, they can execute per-flow traffic management.
Edge nodes are responsible for policing and shaping. They are also responsible for admission control, if
any. Core nodes handle a large amount of traffic flows. They perform per-aggregate rather than per-flow
traffic management.
ECE6609
Basic Approach
• Traffic is divided into a small number of groups called forwarding classes
• Forwarding class that a packet belongs to is encoded into a field in the IP packet header.
• Each forwarding class represents a predefined forwarding treatment in terms of drop priority and bandwidth allocation.
ECE6609
Basic Approach (cont.)
Achieves scalability by implementing traffic classification and conditioning functions at network boundary nodes
Classification involves mapping packets to different forwarding classes.
Conditioning: checking whether traffic flows meet the service agreement and dropping/remarking non-conformant packets.
Interior nodes forward packets based solely on the forwarding class.
ECE6609
Basic Approach (cont.)
Resource allocation for aggregated traffic rather than individual flows
Performance assurance to individual flows in a forwarding class provided through prioritization and provisioning rather than per-flow reservation
Traffic policing on the edge and class-based forwarding in the core
Define forwarding behaviors not services
ECE6609
Basic Approach (cont.)
Guarantee by provisioning rather than reservation
Allocate resources to forwarding class and control the amount of traffic for these classes
Provides only service assurance; no BW or delay guarantee
Based on SLAs, not dynamic signaling
Focus on a single domain, not end-to-end
Forwarding classes can be defined for a single domain and between domains service providers can extend or map their definitions through bilateral agreement
ECE6609
Services and Forwarding Treatment
Two important concepts in DiffServ architecture
Forwarding treatment refers to the externally observable behavior of a specific algorithm or mechanism that is implemented in a node e.g. Express forwarding (using priority queue)
Service is defined by the overall performance that a customer’s traffic receives e.g. a no-loss service provided by Express Forwarding
ECE6609
Per Hop Behavior (PHB)
Forwarding treatments at a node
Each PHB is represented by a 6-bit value called DSCP
All packets with the same code points are referred to as a behavior aggregate (BA) and they receive the same forwarding treatment.
ECE6609
PHB (cont.)
Describe forwarding behavior in either relative or absolute terms
* Minimal BW for BA: absolute term * Allocate BW proportionally: relative
Typically implemented by means of buffer management and packet scheduling.
ECE6609
Per-Hop Behavior
The PHB defines the service a packet receives at each hop as it is forwarded through the network.
It is realized through internal queue management and scheduling techniques.
5 bits of the DS byte can be used to specify the PHB. Therefore, (2^5) = 32 PHBs can be defined. The IETF intends to standardize only a few of them. Packets marked with different DS byte values should
receive different PHB and, accordingly, should experience different services in the core network.
Services can be differentiated using appropriate– Scheduling– Queue Management
ECE6609
Services (cont.)
SLAs may be static or dynamic
Services can be defined in either quantitative or qualitative terms
Services may have different scopes: * All traffic from ingress node A and any egress nodes * All traffic between ingress node A and egress node B
ECE6609
IETF Per-Hop Behaviors
The IETF DiffServ Working Group is finishing work on two PHBs:– Expedited Forwarding (EF)– Assured Forwarding (AF)
ECE6609
Expedited Forwarding PHB
The EF PHB was designed to support low loss, low delay, and low jitter connections.
It appears as a point-to-point virtual leased line (VLL) service between endpoints with a peak bandwidth.
To minimize jitter and delay, packets must spend little or no time in router queues.
Therefore, the EF PHB requires that the traffic be conditioned to conform to the peak rate at the boundary, and the network of routers be provisioned such that this peak rate is less than the minimum packet departure rate at each router in the network.
The EF PHB uses a single DSCP bit to indicate that the packet should be placed in a high-priority queue on the outbound link of each router hop.
ECE6609
Assured Forwarding PHB
The AF PHB defines four relative classes of service with each service supporting three levels of drop precedence.
Twelve distinct DSCP bit combinations define the AF classes and the drop precedence within each class.
When congestion is encountered at a router, packets with a higher drop precedence will be discarded ahead of those with a lower drop precedence.
The four AF classes define no specific bandwidth or delay constrains other than that AF class 1 is distinct from AF class 2, and so on.
ECE6609
Services
Describes the overall treatment of a customer’s traffic within a DS domain or end-to-end.
This is what is visible to the customers; PHBs are hidden inside the network node.
Realizing a service involves many components to work together:
* Mapping of traffic to specific PHBs, * Traffic conditioning at the boundary, * Network provisioning, * PHB-based forwarding in the core
ECE6609
Services (cont.)
In Diffserv, services are defined in the form of a Service Level Agreement (SLA) between a customer and its service provider
One important element of SLA in Diffserv is the Traffic Conditioning Agreement (TCA).
TCA details the service parameters for traffic profiles and policing actions.
ECE6609
Services (cont.)
This may includeTraffic profiles, such as token bucket parameters for each of the classes
Performance metrics: throughput, delay
Actions for non-conformant packetsIn addition to TCA, an SLA may also contain other characteristics and business-related agreements such as availability, security, monitoring, auditing, billing.
ECE6609
CLASSIFIER MARKER
METER
SHAPERDROPPER
PACKETS
Packet Classifier and Traffic Conditioner
ECE6609
Traffic Conditioning Components
– Meter: A meter measures the temporal properties of the stream of packets selected by the classifier against a traffic profile.
– Marker: A packet is marked by setting its DS field to a particular codepoint. The packet now belongs to a certain behavior aggregate.
– Shaper: A shaper holds (delays) some or all the packets in a traffic stream to make the stream to become compliant to the traffic profile.
– Dropper: A dropper discards some or all the packets in a traffic stream to bring the stream into compliance with the traffic profile.
Meter
Marker Shaper&DropperClassifierPackets
ECE6609
Classifier
Divides an incoming packet stream into multiple groups based on predefined rules
Two basic types of classifiers: * Behavior Aggregate (BA) * Multifield (MF)
BA classifier selects packets based solely on DSCP DiffServ Code Point) value in the packet header
BA classifier is used when DSCP has been set (marked) before the packet reaches the classifier
ECE6609
Classifier (Cont.)
MF classifier uses a combination of one or more fields of the five-tuple
(src addr, src port, dest addr, dest port, proto ID)
in the packet header for classification
Classification policies may specify a set of rules and corresponding DSCP values for marking the matched packets
ECE6609
Traffic Conditioner
Performs traffic policing function to enforce the TCA (Traffic Conditioning Agreement) between customer and service providers
Four basic elements: •Meter•Marker•Shaper and •Dropper
ECE6609
Meter
For each forwarding class meter measures the traffic flow from a customer against its traffic profile
In-profile packets are allowed to enter the network
Out-profile packets are further conditioned based on TCA
ECE6609
Marker
Sets the DS field of a packet to a particular DSCP, adding marked packet to forwarding class.
May act on unmarked packets or remark previously marked packets. Can occur at different locations:
* Can be marked by the application * Marked by the first-hop routers
ECE6609
Marker (cont.)
Marking is done on non-conforming packets:
* Packets may be marked with a special DSCP to indicate non-conformance
* These packets would be dropped first in the event of network congestion
Since packets travel through different domains, packets that have been marked may be remarked (to a different DSCP).
ECE6609
Marker (cont.)
When packet REmarked with new DSCP receives worse forwarding treatment than from previous DSCP: PHB demotion
With better forwarding treatment: PHB promotion
ECE6609
Shaper
Shapers delay non-conformance packets in order to bring the stream into compliance.
A stronger form of policing than marking
Shaping may also be needed at a boundary node to a different domain (to make sure that the traffic is conformant before entering the next domain)
Usually has finite buffer, so may also drop packets when buffer is full
ECE6609
Dropper
Discards packets in a traffic stream in order to bring the stream into compliance with a traffic profile.
Strongest policing entity
Can be implemented as a special case of a shaper by setting the shaper buffer size to zero.
ECE6609
Differentiated Services Field
Uses 6 bits in the IP header to encode forwarding treatment
These 6 bits are those out of the IP TOS field (8 bits long)
DiffServ redefines existing IP TOS field to indicate forwarding behavior
Replacement field, called DS field supersedes existing definition of TOS
First 6 bits used as DSCP to encode the PHB, remaining 2 bits are currently unused (CU).
ECE6609
Differentiated Services Field (cont.)
xxxxx0 – standard actionxxxx11 – experimental and local usexxxx01 – experimental and local use but may be subject to standard action (in case pool 1 is exhausted)
ECE6609
Assured Forwarding (AF)
The basic idea came from RIO scheme
In RIO scheme packets are marked as In or Out
During congestion, out packets are dropped first: in/out bit indicates drop priorities
AF standard extended the basic in or out marking in RIO into four forwarding classes and within each forwarding class, three drop precedences
ECE6609
Assured Forwarding (AF) (cont.)
Customers can subscribe to the service built with AF forwarding class and their packets will be marked with appropriate AF DSCPs.
Drop priorities within each forwarding class are used to select which packets to drop during congestion
When backlogged packets from an AF forwarding class exceed a specified threshold, packets with highest drop priority is dropped first, then packets with lower drop priority
ECE6609
AF Implementation
Can be implemented as BW partition between classes and drop priorities within a class
BW partition is specified in terms of minimum BW
Can be achieved by WFQ scheduling and assigning weights according to min BW requirement
ECE6609
AF Implementation (cont.)
AF standard specifies certain properties
Attempt to minimize short-term fluctuation in congestion: Some smoothing function should be applied.
Dropping mechanism should be insensitive to the short term traffic characteristics and discard packets from flows of the same long term characteristics with equal probability: Use random function for dropping
Discard rate of a flow within a drop priority should be proportional to the flow’s percentage of the total amount of traffic passing through that drop priority level
Can use RED or RIO for dropping
ECE6609
Buffer Management
When a router runs out of buffer space packets must be dropped.
In DiffServ, dropping decisions take the DS byte value into account. For example if Weighted Random Early Detection (WRED) is used:
ECE6609
MIN-thr
MAX-thr
1.0
maxp
MIN-thr
MAX-thr
P(drop)
qlen-avg
Random Early Detection (RED)
ECE6609
RED Algorithm (Cont.)
for each packet arrivalcalculate the average queue size “avg”if min-thr <= avg < max-thr
calculate probability pa
with probability pa mark the arriving packet
else if max-thr <= avgmark the arriving packet
ECE6609
RED Algorithm (Cont.)
pb = maxp (avg – min_thr) / (max_thr – min_thr)
pa = pb / (1 – count * pb)
count is the number of packets unmarked since the last packet marking.
pa ensures that the EDGE ROUTER/INGRESS ROUTER does not wait too long before marking a packet.
ECE6609
RED Algorithm (Cont.)
Avoids global synchronization problem by virtue of its randomness
No bias against bursty traffic
ECE6609
RED-In/Out (RIO)
Uses same mechanism as RED, but is configured with two sets of parameters, (in-profile packets and out-profile packets)
Out-packets are dropped more aggressively than in-packets
ECE6609
RED-In/Out (RIO)
Pout = Pmaxout (avgout+in – minout) / (maxout – minout)
Pin = Pmaxin (avgin – minin) / (maxin – minin)
If avgout+in < minout, no packet dropped,
If avgout+in > maxout, all “Out” packets are dropped
If avgin < minin, no packet dropped,
If avgin > maxin, all “In” packets are dropped
ECE6609
P_in (drop) P_out (drop)
1.0 1.0
Avg_in Avg_totMIN-in MAX-in MIN-out MAX-out
P_max_in
P_max_out
RIO (Cont.)
ECE6609
RIO (cont.)
Discrimination against out packet is created by carefully choosing the parameters
(min_in, max_in, Pmax_in) and (min_out, max_out, Pmax_out)
Drops “out packets” earlier than “in packets”: done by choosing
min_out < min_in
Drops “out packets” with a higher probability: Pmax_out > Pmax_in (Congestion Avoidance Phase)
ECE6609
RIO (cont.)
Goes into congestion control phase for “out packets” much earlier than for “in packets” by choosing
max_out <<max_in.
So, RIO drops “out packets” first when it detects some congestion and drops all “out packets” if congestion persists
Only as a last resort, it may drop “in packets” to control congestion
If a router is consistently dropping in packets then the router may be under-provisioned
ECE6609
Expedited Forwarding (EF)
Proposed to characterize a forwarding treatment similar to that of a simple priority queueing.
Forwarding treatment of traffic aggregate must equal or exceed a configurable rate
Should receive this rate independent of load of other traffic passing through the node
Provides low delay and low loss service
Code point <101110> used for EF PHB
ECE6609
EF Implementation
Several queueing mechanisms can be used to implement EF PHB
Priority queueing with token bucket
1. Priority of EF traffic should be highest in the system
2. Token bucket is used to limit the total amount of EF traffic so that other traffic will not starve
WFQ can be used such that weight assigned to EF
traffic has relative priority than other traffic
ECE6609
Interoperability with Non-DS-Compliant Node
Non-DS-compliant node is a node that does not implement some or all of the standardized PHBs.
A special case of a non-DS-compliant node is a legacy node which implements IPv4 Precedence classification as defined in RFC1812 and RFC791
Nodes that are non-DS-compliant and not legacy nodes may exhibit unpredictable forwarding behavior for packets with non-zero DSCP.
ECE6609
Non-DS-Compliant Node within a Domain
When links connected to a non-DS-compliant node are lightly loaded, the performance degradation may be negligible
However, in general, lack of PHB forwarding a node will make it impossible to offer low-delay, low-loss service
Use of legacy node may be acceptable if DS domain restricts itself if the precedence implementation in the legacy node is compatible with services offered along the path
ECE6609
Transit Non-DS-Compliant Domain
DS domain and non-DS domain may negotiate how egress traffic from DS domain be marked before entry into the non-DS domain
When there is no traffic management service available or no agreement in place, DS domain egress node may remark the DSCP to zero, under the assumption that non-DS domain will treat the traffic uniformly as best-effort traffic
ECE6609
End host
End host
core routersedge routers
Differentiated Services (DiffServ)
Scalable: Only simple functions in the core, and relatively complex functions at edge routers (or hosts)Flexible: Does not define service classes, instead provides functional components with which service classes can be builtSimple: Users only specify a qualitative notion of service
ECE6609
DiffServ Drawbacks
The QoS enjoyed by a flow is dependent on the behavior of the other flows belonging to the same aggregate.
There is no per-flow guarantees.
ECE6609
IntServ over DiffServ
Since IntServ has scaling issues in the core of the network, DiffServ was proposed.
IntServ provides guaranteed service per flow whereas DiffServ only provides assurance for aggregated traffic
Thus, application would still like to use IntServ until the edge of the DiffServ core in the ingress side and from edge of the DiffServ core to the end host/router on the egress side
Hence the need for IntServ over DiffServ
ECE6609
IntServ over DiffServ (cont.)
Request for Intserv services needs to be mapped onto underlying capabilities of Diffserv network:
* Selecting appropriate PHB for the requested service
* Performing appropriate policing at the edge of the Diffserv network
* Performing admission control on the Intserv
ECE6609
IntServ over DiffServ (cont.)
When PHB has been selected for a particular Intserv flow, it may be necessary to communicate the choice to other network elements, e.g. when marking is not done at the edge
Two schemes may be used to achieve this:
* Network Driven Mapping (Default) * Microflow Separation
ECE6609
IntServ over DiffServ (cont.)
1. Network Driven Mapping
• RSVP capable routers in Diffserv network (perhaps at the edge) may do the well-known mapping
ECE6609
IntServ over DiffServ (cont.)
2. Microflow Separation
• Boundary nodes at the edge of Diffserv network police
traffic from outside Diffserv network
• But this policing is applied to aggregate traffic
ECE6609
MicroFlow Separation
So it is possible for a misbehaving microflow to claim more than its fair share of resources within the aggregate and degrade service provided to other microflows.
This problem can be addressed in three ways: * Provide per microflow policing at border routers: but this approach puts management burden on the Diffserv region * Rely on upstream elements to do shaping and policing
ECE6609
IntServ over DiffServ (cont.)
Two scenarios in this framework:
* Differv Network is RSVP-unaware * Diffserv Network is RSVP-aware
ECE6609
1. Diffserv network and the customer of this network have negotiated SLAs, e.g., amount of BW Diffserv will provide for each SLA
2. RSVP messages just pass through the Diffserv network as tunnels, without any action being taken.
3. The edge router in Intserv network will identify the service level (DSCP) of the flow and will run admission control to make sure that resources are available in the Diffserv network at the corresponding service level.
Differv Network is RSVP-Unaware
ECE6609
1. Border routers and possibly some/all core routers in Diffserv network are RSVP-aware
2. These routers participate in RSVP signaling, but schedule traffic in aggregate, (like the control plane is RSVP while their data plane is Diffserv)
3. Admission control agent is part of Diffserv network.
Differv Network is RSVP-Aware
ECE6609
Multiprotocol Label Switching (MPLS)
MPLS is a forwarding paradigm. Choosing the next hop can be thought as the
composition of two functions:– Partitioning the entire set of possible packets
into a set of Forwarding Equivalence Classes (FECs).
– Mapping each FEC to a next hop. In the Multiprotocol Label Switching (MPLS),
the assignment of a packet to a particular FEC is done just once: when the packet enters the network.
ECE6609
layer 2header
layer 3header
data
Network (3)
Link (2)
Physical (1)
Remove Layer 2 header New Layer 2 header
IP NetworkMPLS Network
Small tag lookup
Operation of MPLS
ECE6609
• BROADCAST:
Go everywhere, stop when you get to B, never ask for directions.
• HOP BY HOP ROUTING:
Continually ask who’s closer to B go there, repeat … stop when you
get to B. “Going to B? You’d better go to X, its on the way”.
•SOURCE ROUTING:
Ask for a list (that you carry with you) of places to go that
eventually lead you to B. “Going to B? Go straight 5 blocks, take
the next left, 6 more blocks and take a right at the lights”.
One of the many ways of getting from A to B:
“Label Substitution” What is it?
ECE6609
Have a friend go to B ahead of you using one of the previous two techniques.
At every road they reserve a lane just for you.
At ever intersection they post a big sign that says for a given lane which way to turn and what new lane to take.
LANE#1
LANE#2
LANE#1 TURN RIGHT USE LANE#2
Label Substitution
ECE6609
There are many examples of label substitution protocols already in existence.
• ATM - label is called VPI/VCI and travels with cell.
• Frame Relay - label is called a DLCI and travels with
frame.
• TDM - label is called a timeslot its implied, like a lane.
• X25 - a label is an LCN
• Proprietary TAGs etc..
• One day perhaps Frequency Substitution where label is a
light frequency?
A Label by Any Other Name ...
ECE6609
• Hop-by-hop or source routing to establish labels
• Uses label native to the media
• Multi level label substitution transport
SO WHAT IS MPLS ?
ECE6609
IP ForwardingLABEL SWITCHINGIP Forwarding
IP IP #L1 IP #L2 IP #L3 IP
ROUTE AT EDGE, SWITCH IN CORE
ECE6609
UDP-Hello
UDP-Hello
TCP-open
TIME
TIME
Label requestIP
Label mapping#L2
Initialization(s)
MPLS: HOW DOES IT WORK
ECE6609
WHY MPLS ?
Leverage existing ATM hardware
Ultra fast forwarding IP Traffic Engineering
– Constraint-based Routing Virtual Private Networks
– Controllable tunneling mechanism Voice/Video on IP
– Delay variation + QoS constraints
ECE6609
Need for MPLS
IP Routing
• Slow
• No path choice towards destination
• No QoS guarantees
• IP/ATM/SONET/DWDM architecture is not scalable for very large traffic, and very cost-ineffective
ECE6609
PACKETROUTING
CIRCUITSWITCHING
• MPLS + IP form a middle ground that combines the best of IP and
the best of circuit switching technologies.
• ATM and Frame Relay cannot easily come to the middle so IP has!!
MPLS+IP
IP ATM
HYBRID
BEST OF BOTH WORLDS
ECE6609
LDP: Label Distribution Protocol
LSP: Label Switched Path
FEC: Forwarding Equivalence
Class LSR: Label Switching Router
LER: Label Edge Router (Useful
term not in standards)
MPLS Terminology
ECE6609
•FEC = “A subset of packets that are all treated the same way by a router”
•The concept of FECs provides for a great deal of flexibility and scalability
•In conventional routing, a packet is assigned to a FEC at each hop (i.e., L3 look-up), in MPLS it is only done once at the network ingress
Packets are destined for different address prefixes, but can bemapped to common path
Packets are destined for different address prefixes, but can bemapped to common path
IP1
IP2
IP1
IP2
LSRLSRLER LER
LSP
IP1 #L1
IP2 #L1
IP1 #L2
IP2 #L2
IP1 #L3
IP2 #L3
Forwarding Equivalence Classes
ECE6609
#216
#612
#5#311
#14
#99
#963
#462
#963
#14
#99
#311
#311
#311
LABEL SWITCHED PATH (vanilla)
- A Vanilla LSP is actually part of a tree from every source to that destination (unidirectional).- Vanilla LDP builds that tree using existing IP forwarding tables to route the control messages.
ECE6609
47.1
47.247.3
IP 47.1.1.1
Dest Out
47.1 147.2 2
47.3 3
1
23
Dest Out
47.1 147.2 2
47.3 3
1
2
1
2
3
IP 47.1.1.1
IP 47.1.1.1IP 47.1.1.1
Dest Out
47.1 147.2 2
47.3 3
IP FORWARDING USED BY HOP-BY-HOP CONTROL
ECE6609
IntfIn
LabelIn
Dest IntfOut
3 0.40 47.1 1
IntfIn
LabelIn
Dest IntfOut
LabelOut
3 0.50 47.1 1 0.40
MPLS Label Distribution
47.1
47.247.3
1
2
3
1
2
1
2
3
3IntfIn
Dest IntfOut
LabelOut
3 47.1 1 0.50 Mapping: 0.40
Request: 47.1
Mapping: 0.50
Request: 47.1
ECE6609
IntfIn
LabelIn
Dest IntfOut
3 0.40 47.1 1
IntfIn
LabelIn
Dest IntfOut
LabelOut
3 0.50 47.1 1 0.40
47.1
47.247.3
1
2
31
2
1
2
3
3IntfIn
Dest IntfOut
LabelOut
3 47.1 1 0.50
IP 47.1.1.1
IP 47.1.1.1
Label Switched Path (LSP)
ECE6609
#216
#14
#462
#972
#14 #972
A
B
C
Route={A,B,C}
EXPLICITLY ROUTED OR ER-LSP
ER-LSP follows route that source chooses.
In other words, the control message to establish the LSP (label request) is source routed.
ECE6609
IntfIn
LabelIn
Dest IntfOut
3 0.40 47.1 1
IntfIn
LabelIn
Dest IntfOut
LabelOut
3 0.50 47.1 1 0.40
47.1
47.247.3
1
2
31
2
1
2
3
3
IntfIn
Dest IntfOut
LabelOut
3 47.1.1 2 1.333 47.1 1 0.50
IP 47.1.1.1
IP 47.1.1.1
EXPLICITLY ROUTED LSP ER-LSP
ECE6609
ER LSP - Advantages
• Operator has routing flexibility (policy-based, QoS-based)• Can use routes other than shortest path• Can compute routes based on constraints in exactly the same manner as ATM based on distributed topology database. (traffic engineering)
ECE6609
•MPLS is intended to run over multiple link layers
•Specifications for the following link layers currently exist:
• ATM: label contained in VCI/VPI field of ATM header
• Frame Relay: label contained in DLCI field in FR header
• PPP/LAN: uses ‘shim’ header inserted between L2 and L3 headers
Translation between link layers types must be supported
MPLS intended to be “multi-protocol” below as well as aboveMPLS intended to be “multi-protocol” below as well as above
MPLS Link Layers
ECE6609
ATM LSR constrained by the cell format imposed by existing ATM standardsATM LSR constrained by the cell format imposed by existing ATM standards
VPI PT CLP HEC
5 Octets
ATM HeaderFormat VCI
AAL5 Trailer
•••Network Layer Header
and Packet (eg. IP)
1n
AAL 5 PDU Frame (nx48 bytes)
Generic Label Encap.(PPP/LAN format)
ATMSAR
ATM HeaderATM Payload • • •
• Top 1 or 2 labels are contained in the VPI/VCI fields of ATM header - one in each or single label in combined field, negotiated by LDP• Further fields in stack are encoded with ‘shim’ header in PPP/LAN format
- must be at least one, with bottom label distinguished with ‘explicit NULL’
• TTL is carried in top label in stack, as a proxy for ATM header (that lacks TTL)
48 Bytes
48 Bytes
Label LabelOption 1
Option 2 Combined Label
Option 3 LabelATM VPI (Tunnel)
MPLS Encapsulation - ATM
ECE6609
Label Exp. S TTL
Label: Label Value, 20 bits (0-16 reserved)Exp.: Experimental, 3 bits (was Class of Service)S: Bottom of Stack, 1 bit (1 = last entry in label stack)TTL: Time to Live, 8 bits
Layer 2 Header(eg. PPP, 802.3)
•••Network Layer Header
and Packet (eg. IP)
4 Octets
MPLS ‘Shim’ Headers (1-n)
1n
•Network layer must be inferable from value of bottom label of the stack•TTL must be set to the value of the IP TTL field when packet is first labelled•When last label is popped off stack, MPLS TTL to be copied to IP TTL field•Pushing multiple labels may cause length of frame to exceed layer-2 MTU - LSR must support “Max. IP Datagram Size for Labelling” parameter - any unlabelled datagram greater in size than this parameter is to be fragmented
MPLS on PPP links and LANs uses ‘Shim’ Header Inserted Between Layer 2 and Layer 3 Headers
MPLS on PPP links and LANs uses ‘Shim’ Header Inserted Between Layer 2 and Layer 3 Headers
Label StackEntry Format
MPLS Encapsulation - PPP & LAN Data Links
ECE6609
MPLS & ATM
Several Models for running MPLS on
ATM:1. Label-Controlled ATM:
• Use ATM hardware for label switching• Replace ATM Forum SW by IP/MPLS
IP RoutingMPLS
ATM HW
ECE6609
•Label switching is used to forward network-layer packets
•It combines the fast, simple forwarding technique of ATM with network layer routing and control of the TCP/IP protocol suite
IP Packet 17
IP Packet 05
B
A
D
C
Forwarding Table
B 17 C 05•••
Port
Label Switching Router
ForwardingTable
Network LayerRouting
(eg. OSPF, BGP4)
Label
Packets forwardedby swapping short,fixed length labels
(I.e. ATM technique)
Packets forwardedby swapping short,fixed length labels
(I.e. ATM technique)
Switched path topologyformed using network
layer routing(I.e. TCP/IP technique)
Switched path topologyformed using network
layer routing(I.e. TCP/IP technique)
Label
ATM Label Switching is the combination of L3 routing and L2 ATM switchingATM Label Switching is the combination of L3 routing and L2 ATM switching
Label-Controlled ATM
ECE6609
MPLS Over ATM
MPLSATM Network
MPLS
LSR
LSR
VCVP
Two Models
Internet Draft:VCID notification over ATM Link
ECE6609
Ships in the Night
ATM and MPLS control planes both run on the same hardware but are isolated from each other, i.e. they do not interact.
This allows a single device to simultaneously operate as both an MPLS LSR and an ATM switch.
Important for migrating MPLS into an ATM network
ATMSW
LSR ATM
MPLS
ATMSW
LSR
ECE6609
Ships in the Night Requirements
Resource Management–VPI.VCI Space Partitioning–Traffic management
•Bandwidth Reservation •Admission Control•Queuing & Scheduling•Shaping/Policing
–Processing Capacity
ECE6609
• Bandwidth Bandwidth GuaranteesGuarantees
• FlexibilityFlexibility
A.A. Full SharingFull Sharing
Po
rt C
ap
acity
Po
rt C
ap
acity
Pool 1 Pool 1 •MPLSMPLS•ATMATM
MPLSMPLS
ATMATM
AvailableAvailable
B. Protocol PartitionB. Protocol Partition
Pool 2 Pool 2 •50%50%•rt-VBRrt-VBR
Pool 1 Pool 1 •50%50%•ATMATM
MPLSMPLS
ATMATM
AvailableAvailable
AvailableAvailable
C. Service PartitionC. Service Partition
Pool 2 Pool 2 •50%50%•nrt-VBRnrt-VBR•COS1COS1
Pool 1 Pool 1 •50%50%•rt-VBRrt-VBR•COS2COS2
MPLSMPLS
ATMATM
AvailableAvailable
MPLSMPLS
ATMATM
AvailableAvailable
Bandwidth Management
ECE6609
ATM Merge
Multipoint-to-point capability
Motivation– Stream Merge to achieve scalability in MPLS:
•O(n) VCs with Merge as opposed to O(n2) for full mesh
• less labels required– Reduce number of receive VCs on terminals
Alternatives– Frame-based VC Merge– Cell-based VP Merge
ECE6609
Stream Merge
111
2 2 2
3 3
111
2 2 2
3 3
Input cell streams
Input cell streams
in out1
2
3
7
6
9
12
3
77
7
in out
Non-VC merging (Nin--Nout)
VC merging (Nin-1out)
7 7 7 7 7 777
6 7 9 6 7 79 6
7 7 7 7 7 77
No Cell Interleaving
7
AAL5 Cell Interleaving Problem
ECE6609
VC-Merge: Output Module
Merge
Reassembly buffers
Output buffer
ECE6609
VP-Merge
VPI=3
VPI=2
VCI=1
VPI=1
VCI=2
VCI=3
VCI=1
VCI=2
VCI=3
–merge multiple VPs into one VP–use separate VCIs within VPs to distinguish frames–less efficient use of VPI/VCI space, needs support of SVP
No Cell Interleaving ProblemSince VCI is unique
Option 1: Dynamic VCI Mapping
Option 2: Root Assigned VCI
ECE6609
• Simplified forwarding based on exact match of fixed length label
- initial drive for MPLS was based on existance of cheap, fast ATM switches
• Separation of routing and forwarding in IP networks- facilitates evolution of routing techniques by fixing the forwarding method
- new routing functionality can be deployed without changing the forwarding techniques of every router in the Internet
• Facilitates the integration of ATM and IP- allows carriers to leverage their large investment of ATM equipment
- eliminates the adjacency problem of VC-mesh over ATM•Enables the use of explicit routing/source routing in IP networks
- can be easily used for such things as traffic management, QoS routing
Summary of Motivations for MPLS
ECE6609
• Promotes the partitioning of functionality within the network
- move granular processing of packets to edge; restrict core to packet forwarding
- assists in maintaining scalability of IP protocols in large networks • Improved routing scalability through stacking of labels
- removes the need for full routing tables from interior routers in transit domain; only routes to border routers are required
• Applicability to both cell and packet link-layers- can be deployed on both cell (eg. ATM) and packet (eg. FR, Ethernet) media
- common management and techniques simplifies engineering
Many drivers exist for MPLS above and beyond high speed forwarding Many drivers exist for MPLS above and beyond high speed forwarding
Summary of Motivations for MPLS
ECE6609
IP over ATM VCsIP over ATM VCs
• ATM cloud invisible to Layer 3 Routing
• Full mesh of VCs within ATM cloud
• Many adjacencies between edge routers
• Topology change generates many route updates
• Routing algorithm made more complex
• ATM network visible to Layer 3 Routing
• Singe adjacency possible with edge router
• Hierachical network design possible
• Reduces route update traffic and power needed to process them
IP over MPLSIP over MPLS
MPLS eliminates the “n-squared” problem of IP over ATM VCsMPLS eliminates the “n-squared” problem of IP over ATM VCs
IP and ATM Integration
ECE6609
A
B C
D
Traffic engineering is the process of mapping traffic demand onto a networkTraffic engineering is the process of mapping traffic demand onto a network
Demand
NetworkTopology
Purpose of traffic engineering:
•Maximize utilization of links and nodes throughout the network•Engineer links to achieve required delay, grade-of-service•Spread the network traffic across network links, minimize impact of single failure•Ensure available spare link capacity for re-routing traffic on failure•Meet policy requirements imposed by the network operator
Traffic engineering key to optimizing cost/performance
Traffic Engineering
ECE6609
Current Methods of Traffic Engineering:
Manipulating routing metrics
Use PVCs over an ATM backbone
Over-provision bandwidth
Difficult to manage
Not scalable
Not economical
MPLS combines benefits of ATM and IP-layer traffic engineering
Chosen by routing protocol(least cost)
Chosen by Traffic Eng.(least congestion)
Example Network:
MPLS provides a new method to do traffic engineering (traffic steering)
Ingress nodeexplicitly routes
traffic over uncongested path
Potential benefits of MPLS for traffic engineering: - allows explicitly routed paths - no “n-squared” problem - per FEC traffic monitoring - backup paths may be configured
operator controlscalable granularity of feedback redundancy/restoration
Congested Node
Traffic Engineering Alternatives
ECE6609
•MPLS can use the source routing capability to steer traffic on desired path
•Operator may manually configure these in each LSR along the desired path
- analogous to setting up PVCs in ATM switches
•Ingress LSR may be configured with the path, RSVP used to set up LSP - some vendors have extended RSVP for MPLS path set-up
•Ingress LSR may be configured with the path, LDP used to set up LSP - many vendors believe RSVP not suited
•Ingress LSR may be configured with one or more LSRs along the desired path, hop-by-hop routing may be used to set up the rest of the path
- a.k.a loose source routing, less configuration required
•If desired for control, route discovered by hop-by-hop routing can be frozen
- a.k.a “route pinning”
•In the future, constraint-based routing will offload traffic engineering tasks from the operator to the network itself
MPLS Traffic Engineering Methods
ECE6609
BR1
BR2
BR3
BR4
TR1 TR2
TR3TR4
AS1AS2 AS3
•Border routers BR1-4 run an EGP, providing inter-domain routing•Interior transit routers TR1-4 run an IGP, providing intra-domain routing•Normal layer 3 forwarding requires interior routers to carry full routing tables
- transit router must be able to identify the correct destination ASBR (BR1-4)
•Carrying full routing tables in all routers limits scalability of interior routing
- slower convergence, larger routing tables, poorer fault isolation•MPLS enables ingress node to identify egress router, label packet based on interior route
•Interior LSRs would only require enough information to forward packet to egress
Ingress routerreceives packetIngress router
receives packetPacket labelled
based onegress router
Packet labelled based on
egress router
Forwarding in the interiorbased on IGP route
Forwarding in the interiorbased on IGP route
Egress borderrouter pops
label and fwds.
Egress borderrouter pops
label and fwds.
MPLS increases scalability by partitioning exterior routing from interior routing
MPLS: Scalability Through Routing Hierarchy
ECE6609
Routing
Forwarding
OSPF, IS-IS, BGP, RIP
MPLS
Forwarding Table
Based on:Classful Addr. Prefix?Classless Addr. Prefix?Multicast Addr.?Port No.?ToS Field?
Based on:Exact Match on Fixed Length Label
•Current network has multiple forwarding paradigms - class-ful longest prefix match (Class A,B,C boundaries) - classless longest prefix match (variable boundaries) - multicast (exact match on source and destination) - type-of-service (longest prefix. match on addr. + exact match on ToS)•As new routing methods change, new route look-up algorithms are required - introduction of CIDR•Next generation routers will be based on hardware for route look-up - changes will require new hardware with new algorithm•MPLS has a consistent algorithm for all types of forwarding; partitions routing/fwding - minimizes impact of the introduction of new forwarding methods
MPLS introduces flexibility through consistent forwarding paradigmMPLS introduces flexibility through consistent forwarding paradigm
MPLS: Partitioning Routing and Forwarding
ECE6609
Ethernet PPP(SONET, DS-3 etc.)
ATM FrameRelay
•MPLS is “multiprotocol” below (link layer) as well as above (network layer)
•Provides for consistent operations, engineering across multiple technologies
•Allows operators to leverage existing infrastructure
•Co-existence with other protocols is provided for - eg. “Ships in the Night” operation with ATM, muxing over PPP
MPLS positioned as end-to-end forwarding paradigmMPLS positioned as end-to-end forwarding paradigm
Upper Layer Consistency Across Link Layers
ECE6609
Common Misconceptions
IP QoS is not ready for real, production networks.QoS is not useful unless it is deployed end-to-end.Only ATM networks can support true, end-to-end QoS.