1 gmpls networks malathi veeraraghavan university of virginia [email protected] tutorial at ieee...
TRANSCRIPT
1
GMPLS networks
Malathi VeeraraghavanUniversity of [email protected]
Tutorial at IEEE Globecom 2006
2
Acknowledgment to co-authors
• All co-authors are affiliated with the University of Virginia (@virginia.edu)– Helali Bhuiyan (helali): OSPF-TE – Xiuduan Fang (xf4c): GFP, VCAT, LCAS– Tao Li (tl8g): LMP and OTN– Mark McGinley (mem5qf): MPLS, PWE– Xiangfei Zhu (xz4p): RSVP, Cheetah
3
Outline
• Principles– Different types of connection-oriented
networks
• Technologies– Single network– Internetworking
• Usage– Commercial networks– Research & Education Networks (REN)
4
Principles
• Packet-switched vs. circuit switched networks
• Connection-oriented vs. connectionless modes of bandwidth sharing
• Analytical models
5
A model for a single network
Host Host
HostHost
Switch Switch
• Hosts represent data sources and sinks• A switch moves data units from one link to another
– enable sharing of a link's bandwidth
• My definition of "switch:" the multiplexing scheme is the same on all the links (e.g., a network of SONET TDM switches or a network of Ethernet packet switches)
Link
Link Link
LinkLink Switch
6
Types of switched networks
Switching typeNetworkingtype
Circuit-switched (CS)
Packet-switched (PS)
Connectionless (CL)
Not an option e.g., switched Ethernet networks
Connection-oriented (CO)
e.g., telephone network, SONET networks
e.g., MultiProtocol Label Switching (MPLS)
Virtual-circuit (VC)networks
7
Analogy for connectionless packet-switched networks: roads
• The intersection point of two roads is analogous to a connectionless packet switch
• Road signs at a traffic light are analogous to the routing table at a packet switch
8
Analogy for circuit switched networks
• Airlines networks• Airports are analogous to circuit switches• Procedure
– Call setup: comparable to an airline passenger calling ahead to make reservations for a seat on each leg of a multi-flight trip
– Reserved time slots on each link: similar to seat assigments on each flight
– When trip actually starts and passenger arrives on one flight at an airport, he/she simply moves to assigned seat on next flight
9
Circuit switch vs. packet switch
• Depends upon multiplexing technique used on interfaces– Position based multiplexing (circuit
switch)• "Position" means time or frequency
(wavelength)
– Packet based multiplexing (packet switch)
10
Generic switch design
• Links connected to the switch are typically shared– this means there is
multiplexing/MAC function in the data-link layer implementation at the end of the link
1
23
Q
.
.
.
Inputinterfaces
.
.
.
1
23
Q
Outputinterfaces
Unfolded view
of a switch
M
M
M
M
Spaceswitch
D
D
D
D
D: DemultiplexerM: Multiplexer
Crossbar or multistage space switch
11
Recall Time Division Multiplexing
.
.
1
12
90 x 9 x 8 bitsevery 125s
RATE: 51.84Mbps - OC1
1 12
2
...........2
12 x 90 x 9 x 8 bitsevery 125s
RATE: 622.08 - OC12
Example: 12 OC1 signals multiplexed on to an OC12 signal
12
SONET/SDH rates(number is the multiplier)
Tanenbaum
13
Example of a circuit switch (TDM): SONET
11
2
Q
2
Q
.
.
.
.
.
.
.
.
....
....
....
....
....
....
demultiplexers multiplexers
OC12
OC12
OC12
OC12
OC12
OC12
OC1 OC1
Crossconnect rate: OC1
ControllerRouting and signaling
What is the relation between P and Q?
PxPspaceswitch
(also called switch fabric
or interconnection
fabric)
14
Example of a circuit switch: a SONET switch
Host A
Host DHost B
SONETswitch
a
b
c
d
OC3
OC3
OC3
OC3 Host C
Create a bidirectional OC1 circuit between host A and host CUse it for application 1 (e.g., web access)
Create another bidirectional OC1 circuit between host A and host BUse it for application 2 (e.g., email)
Portsor interfaces
15
Example of what happens inside a SONET switch (crossbar assumed)
Controller
Input ports
Output ports
a
b
c
d
a b c d
OC1Host A
Host B
Host C
125s
A1
Host D
OC3
A1A2
C1
B1
A2
C1 Host A Host B Host C
B1
Host D
16
Timeslot mapping table
• We need to use timeslot 2 on port a to/from host A for the AB OC1 circuit since timeslot 1 on this port was already used for the AC OC1 circuit
Input Output
Port Timeslot Port Timeslot
a 1 c 1
c 1 a 1
a 2 b 1
b 1 a 2
17
Controller
1
2
3
N
Line card
Line card
Line card
Line card
Spa
ce s
witc
h
Line card
Line card
Line card
Line card
1
2
3
N
Input ports Output ports
Data path Control path
…………
Packet switch: header based switching
“Unfolded” View of Switch• Input line card functions
– Packet-based demultiplexing
– Header processing• Output line card
functions– Packet-based multiplexing
• Space switch– Transfer packets between
line cards– Close crosspoints on a
packet-by-packet basis• Controller
– Next week's lectures
Folded view: 1 line card has both input and output functions
18
Insides of a packet switch – unfolded view (assuming crossbar
space switch)Controller
1
2
3
4
Line card
Line card
Line card
Line card
Line cardLine cardLine cardLine card
1 2 3 4
Input ports
Output ports1. Physical layer2. Data link layer3. Once a frame is received correctly, consult routing table with destination MAC address4. Make appropriate crosspoint connection in fabric5. Send frame to appropriate outgoing line card
19
Example of a packet switch: an Ethernet switch
• Headers of incoming packets are processed to extract destination address• Routing table is consulted for output port, O, corresponding to destination address• Corresponding space switch crosspoint connection is closed• Packet is sent from input port to the output port, O
Host 1
Host 4Host 2
Host 3
Ethernetswitch
Destination MAC address: 05:a1:08:10:a4:3e
Destination MAC address: 09:a5:08:10:a4:3da
b
c
d
Destination Output Port
05:a1:08:10:a4:3e
09:a5:08:10:a4:3d
05:a1:08:10:a4:3e c
b09:a5:08:10:a4:3d
Routing table (called filtering database in Ethernet switches)
20
Going back to circuit switch
• The space switch could be time-shared
• In which case, P = Q• The switch is reprogrammed in each
time slot– in the OC3 example, if the space switch
lines were operated at OC3, then every 125/3 s the switch fabric crosspoints will need to be reprogrammed
21
Types of switched networks
Switching typeNetworkingtype
Circuit-switched (CS)
Packet-switched (PS)
Connectionless (CL)
Not an option e.g., IP networks; Ethernet networks
Connection-oriented (CO)
e.g., telephone network, SONET (Synchronous Optical NETwork)
e.g., MultiProtocol Label Switching (MPLS)
Next focus
22
Data plane vs. control plane
• Data plane: Functions, layers and protocols needed for the actual transfer of user data
• Control plane: Support functions and protocols needed at each layer of the data plane
23
Focus on the Network layer (NL)
• Data plane functions of the NL:– Implementation of the NL at a switch:
• Moves data units from one link to another
– Implementation of the NL at end hosts (support role): • Create/parse packet headers for packet-based NL• Position user data units for circuit-based NL
• Control-plane (support) functions for the NL:– Addressing for all three types of networks– Routing for all three types of networks– Signaling for connection-oriented networks
24
Mapping of control-plane (support)
functions to types of networksSupport function
Networktype
Addressing(in data or control plane?)
Routing Signaling
Connectionless (CL)
Data plane
Connection-oriented Circuit Switched (CS)
Control plane
Connection-oriented Packet Switched (CO PS)
Control plane
25
Sub-section outline
• Understand three support functions– Addressing– Routing– Signaling
• Learn how these are used in the three different types of networks– Connectionless packet-switched networks– Circuit-switched networks– Connection-oriented packet-switched or
virtual-circuit networks
26
Addressing
• Where are host interface addresses used:– In connectionless packet-switched networks,
destination addresses are carried in packet headers
• hence we place this function as being in the data plane
– In connection-oriented (circuit/VC) networks, these addresses are used in signaling messages (needed for call setup)
• hence we place this function as being executed in the control plane
27
Summarized addresses
• What are summarized addresses?• Why summarize addresses?
28
Summarized addresses
• What are summarized addresses?– An address that represents a group of
endpoint addresses– e.g., all 212 numbers, 128.238 IP addresses
• Why summarize addresses?– Reduces routing table sizes – hold one entry
for a summarized address instead of a large number of individual addresses
– Reduces routing message lengths that convey reachability information
29
Functions related to addressing
• Techniques to assign addresses to interfaces– Hardwire as with MAC addresses– Assign to interfaces as with IP
addresses– Assign to nodes (hosts/switches)
30
Examples of addressing schemes
• Ethernet switched network– 6-byte MAC address
• IP-based networks– 4-byte IP addresses
• Telephone networks– 8-byte E.164 address (telephone
number)
31
Sub-section outline
• Understand three support functions– AddressingRouting– Signaling
• Learn how these are used in the three different types of networks– CL– CS– CO PS
32
Goal of routing algorithms
• Goal is to allow for the calls or packets to be routed on the "shortest" path, where the shortest path is determined by some metric, e.g.:– minimum weight path (add link weights)– minimum end-to-end delay– path with the most available bandwidth
• The algorithm should adapt to changes in – topology (includes administrator-set link
weights)– reachability– loading conditions
33
Types of routing algorithms
• Static or dynamic– If changes are rare, static schemes
incur low overhead - not applicable in production networks
– Dynamic algorithms add complexity, but offer bandwidth utilization gains
• Centralized or distributed– Centralized schemes allow for optimal
solutions– Distributed schemes are scalable
34
Other classifications
• Hop-by-hop vs. source routing– Hop-by-hop: Each switch determines
what the next-hop switch is toward which to route the packet or the call
– Source: Ingress switch determines the end-to-end route• Loose source route: not all hops specified• Strict source route: exact set of hops
specified
35
Distributed routing:Routing protocols
• Two types: – Distance vector protocols
• Each switch maintains a distance table (in addition to the routing table). The distance table shows the distances to all nodes in the network through each of its neighbors. The shortest path is then computed and the outgoing port information is stored in the routing table.
– Link state protocols• The whole topology of the network is kept at
each switch. Shortest path algorithms such as Dijkstra's are then run to determine the routing tables.
36
Dijkstra’s algorithm
• Find shortest paths from source node s to all other nodes. – At each iteration, determine the "next
closest node" from the source s. • For each node that is still not in set M (the set
containing nodes to which shortest paths from the source have already been computed), determine if its current distance from s is shorter than if the path came through the selected "next closest node.“
• Find the “next closest node” after finding all the distances.
• Move "next closest node" to set M.
– When the set M has all the nodes, stop.
37
Routing table precomputation: Calculate shortest path for node s to all other nodes
given the link weights (w) for all links
Dijkstra’s shortest-path algorithm:
s source node.
goal: find Dn cost of the least-cost path from node s to node n for all n
M = {s};
for each n M Dn = wsn;
while (M all nodes) do Find i M for which Di = min{Dj ; j M};
Add i to M;for each n M
Dn = mini [ Dn, Di + win ];Update route;
enddo
38
Example
1
2 3
4 5
6
5
2
1
2
3
3
1
1
2
5
w56=2
39
List path weight and route
M D2 D3 D4 D5 D6
0 {1} 2, {1-2}
5, {1-3} 1, {1-4}
1 {1,4} 2, {1-2}
4, {1-4-3} 1, {1-4}
2, {1-4-5}
2 {1,4,2,5} 2, {1-2}
3, {1-4-5-3}
1, {1-4}
2, {1-4-5}
4,{1-4-5-6}
3 {1,4,2,5,3}
2, {1-2}
3, {1-4-5-3}
1, {1-4}
2, {1-4-5}
4,{1-4-5-6}
4 {1,4,2,5,3,6}
2, {1-2}
3, {1-4-5-3}
1, {1-4}
2, {1-4-5}
4,{1-4-5-6}
40
Resulting Routing Table
1
2 3
4 5
6
2
1 1
1
2
The tree is translated into a routing table at node 1:Destination Next Hop
2 23 44 45 4
6 4
41
Examples of routing protocols
• In Ethernet networks– Address learning and the spanning tree
algorithm
• In the Internet:– Link-state routing protocols, such as Open
Path Shortest First (OSPF)– Distance-vector based routing protocols,
such as Border Gateway Protocol (BGP)
• In telephone networks:– Real-Time Network Routing (RTNR)
42
Sub-section outline
• Understand three support functions– Addressing– RoutingSignaling
• Learn how these are used in the three different types of networks– CL– CS– CO PS
43
Purpose of signaling(needed only in CO networks)• Functions:
– Call setup:• route selection• bandwidth reservation on each link of
end-to-end connection• switch fabric configuration of each switch
– Call release• release bandwidth for use by others
44
Examples of signaling protocols
• ISDN User Part of the SS7 (Signaling System No. 7) protocol stack– to set up and release DS0 (64kbps) circuits
in a telephone (circuit-switched) network
• Resource reSerVation Protocol with Traffic Engineering (RSVP-TE) – used in CO PS networks such as MPLS/ATM– used in CS networks such as SONET/SDH
and WDM
45
Sub-section outline
• Understand three support functions– Addressing– Routing– Signaling
• Learn how these functions are used in the three different types of networksCL– CS– CO PS
46
Connectionless packet-switched networksPhase 1: Routing protocol exchanges
+ routing table precomputation
IV
I
V
III
II
Dest. Next hop
III-* IVDest. Next hop
III-* III
Dest. Next hop
III-B III-BIII-C III-C
Host I-A
Host III-B
Host III-C
• I, II, III, IV, V: switches• Link weights are shown next to links• Host interface addresses are derived from switch addresses
(e.g. I-A is connected to switch I)• Example routing table entries shown at switches I, III, IV• III-*: summarized address for all hosts connected to switch III
5
1 1
1
4
1
1
Routing table(will have other entries)
47
IV
I
V
III
II
Dest. Next hop
III-* IVDest. Next hop
III-* III
Dest. Next hop
III-B III-BIII-C III-C
Host I-A Host
III-B
Host III-C
III-B III-BIII-B
III-B
Packet header Packet payload
Connectionless (CL) packet-switched networks Phase 2: User-plane packet forwarding
• Packet header carries destination host interface address (unchanged as it passes hop by hop)
• Each CL packet switch does a route lookup to determine the outgoing next hop node or port
48
Sub-section outline
• Understand three support functions– Addressing– Routing– Signaling
• Learn how these are used in the three different types of networks– CLCS– CO PS
49
Circuit-switched networksPhase 1: Routing protocol exchanges
+ routing table precomputation
IV
I
V
III
II
Dest. Next hop
III-* IVDest. Next hop
III-* III
Dest. Next hop
III-B III-BIII-C III-C
Host I-A
Host III-B
Host III-C
• Same as the Phase 1 routing protocol exchanges described for connectionless (CL) packet-switched networks
• More emphasis on exchanging loading information
50
Circuit-switched networksPhase 2: Signaling for call setup
Host I-A Host
III-B
I
IV V
III
II
ab
c
a
b
c
d
d c
a
b
Connection setup (Dest: III-B; BW: OC1; Timeslot: a, 1)
Dest. Next hop
III-* IV
Routingtable
Connection setup actions at each switch on the path:1. Parse message to extract parameter values2. Lookup routing table for next hop to reach
destination3. Read and update CAC (Connection Admission
Control) table4. Select timeslots on output port5. Configure switch fabric: write entry into timeslot
mapping table6. Construct setup message to send to next hop
51
Circuit-switched networksPhase 2: Signaling for call setup
Connection setup actions at each switch on the path:1. Parse message to extract parameter values2. Lookup routing table for next hop to reach
destination3. Read and update CAC (Connection Admission Control)
table4. Select timeslots on output port5. Configure switch fabric: write entry into timeslot
mapping table6. Construct setup message to send to next hop
Host I-A Host
III-B
I
IV V
III
II
a
b
c
a
b
c
d
d c
a
b
Dest. Next hop
III-* IV
Routingtable
Next hopInterface (Port);
Capacity; Avail timeslots
IV c; OC12; 1, 4, 5
CACtable
Connection setup (Dest: III-B; BW: OC1; Timeslot: a, 1)
INPUTPort /Timeslot
OUTPUTPort/Timeslot
a/1 c/4
Timeslotmapping table
Connection setup
Update to remove timeslot 1from available list
52
Circuit-switched networksPhase 2: Signaling for call setup
Host I-A Host
III-B
I
IV V
III
II
Connection setup
a
b
c
a
b
c
d
d c
a
b
INPUTPort /Timeslot
OUTPUTPort/Timeslot
a/4 c/2
Perform same set of 6 connection setup steps at switch IV write timeslot mapping table entry, update CAC table andsend connection setup message to the next hop
Connection setup(Dest: III-B; BW: OC1;
Timeslot: a, 4)Time slotcould change
53
Circuit-switched networksPhase 2: Signaling for call setup
Host I-A Host
III-B
I
IV V
III
II
a
b
c
a
b
c
d
d c
a
b
INPUTPort /Timeslot
OUTPUTPort/Timeslot
d/2 b/1
Connectionsetup
Circuit setupcomplete
Perform same set of 6 connection setup steps at switch III
Reverse setup-confirmation messages typically sent from destination through switches to source host
Connection setup
54
Circuit-switched networksPhase 3: User-data flow
• Bits arriving at switch I on time slot 1 at port a are switched to time slot 1 of port c
Host I-A Host
III-B
I
IVV
III
II
a
b
c
a
bc
d
d c
a
b
OUTPort/Timeslot
INPort /Timeslot
a/1 c/4IN
Port /TimeslotOUT
Port/Timeslot
a/4 c/2
INPort /Timeslot
OUTPort/Timeslot
d/2 b/1
1 2
1 2 1 2
1 2
55
Release procedure
• When a communication session ends, there is a hop-by-hop release procedure (similar to the setup procedure) to release timeslots/wavelengths for the next call
56
Sub-section outline
• Understand three support functions– Addressing– Routing– Signaling
• Learn how these are used in the three different types of networks– CL– CSCO PS - Virtual Circuit (VC) networks
57
CO packet-switched (VC) networks Phase 1: Routing protocol exchanges
+ routing table precomputation
IV
I
V
III
II
Dest. Next hop
III-* IVDest. Next hop
III-* III
Dest. Next hop
III-B III-BIII-C III-C
Host I-A
Host III-B
Host III-C
• Same as the Phase 1 routing protocol exchanges described for connectionless (CL) packet-switched networks
• More emphasis on exchanging loading information
58
CO packet-switched (VC) networksPlane 2: Signaling
Connection setup actions at each switch on the path:1. Message parsing to extract parameter values2. Route lookup for next hop to reach destination3. CAC (Connection Admission Control) for BW and
buffer4. Label selection5. Switch fabric configuration6. Message construction to send to next hop
Host I-A Host
III-B
I
IV V
III
II
a
b
c
a
bc
d
d c
a
bConnection setup
INPort /Label
OUTPort/Label
a/46 c/20
Connection setup
INPort /Label
OUTPort/Label
d/20 b/1
Connectionsetup
Virtual circuit
Connection setup (Dest: III-B; Traffic descriptor; QoS; Label: a, 1)
Dest. Next hop
III-* IV
Routingtable
INPort /Label
OUTPort/Label
a/1 c/46
Switchconfig.table
Next hop
Interface (Port);Capacity; Free BW/buffer;
Free labels
IV c; OC12; x/y; 10, 46, 50
CACtable
59
1
CO packet-switched (VC) networksPlane 3: User-data flow
• Packets sent by host I-A with the label field in the packet header set to 1 are switched according to entries in the switch configuration tables at each switch following the path of the established virtual circuit.
Host I-A
Host III-B
I
IVV
III
II
a
b
c
a
bc
d
d c
a
b
INPort /Label
OUTPort/Label
a/46 c/20
INPort /Label
OUTPort/Label
d/20 b/1
INPort /Label
OUTPort/Label
a/1 c/46
Switchconfig.table
46 201
Packet header Packet payload
Label
Virtual circuit
2
HostII-B
a/2 c/1
a/1 b/1
Packet header Packet payload
60
Let us not confuse addresses with labels
• Addresses:– numbers assigned to end hosts or end host interfaces– globally unique
• Labels:– assigned to identify a virtual circuit on a link– unique just to the link (like seat assignments on a flight;
same seat numbers can be assigned on different flights)
• Scope for confusing the two: – When the action performed by a packet switch is
examined, • a connectionless switch forwards packets based on addresses• while a connection-oriented switch forwards packets based on
labels
61
Rationale for VC networks
• Combine– QoS-guaranteed service of circuit-
switched networks– Ability of packet-switched networks to
handle bursty traffic
62
"Best" of both worlds
• Service guarantees to users• High utilization: beneficial to
service providers
63
"Worst" of both worlds
• Complexity– Control plane: Switch controllers need to
implement signaling protocols and handle setup/release requests for bandwidth
• Inherits complexity of circuit switch controllers
– Data plane: Line cards need packet based demultiplexing, space switch needs to be reconfigured on a packet-by-packet basis, need buffering
• Inherits complexity of packet switches
64
Principles
• Packet-switched vs. circuit switched networks
• Connection-oriented vs. connectionless modes of bandwidth sharing
Analytical models
65
Bandwidth sharing
• The very purpose for the existence for networks is to enable bandwidth sharing
• The purpose of a communication link is to move data bits from one point to another
• But the purpose of a network of links interconnected by switches is to enable the sharing of bandwidth on these links
66
How is bandwidth shared on a connectionless packet-switched network?
• Pre-1988 IP network:– Just send data without reservations or any
mechanism to adjust rates
• Van Jacobson's 1988 contribution:– Added congestion control to TCP– TCP software at the sending end host
adjusts its sending rate based on estimates of congestion in the router buffers
67
TCP throughput
• B: Throughput, RTT: Round-trip time• b: an ACK is sent every b segments (b is typically 2)• p: packet loss rate on path
• T0: initial retransmission time out in a sequence of retries
• Interesting observation: throughput is independent of bottleneck link rate – congestion-avoidance algorithm model– for low packet loss rate, it does matter, when file size is large
• Padhye, Firoui, Towsley, Kurose, ACM Sigcomm 98 paper
)321()8
33,1min(
32
1
20 pp
bpT
bpRTT
B
68
TCP throughput
438750msCase 18
441.75msCase 17
12.430.1ms1GbpsCase 16
441750msCase 15
471.75msCase 14
92.410.1ms100Mbps
0.01Case 13
128750msCase 12
129.45msCase 11
8.640.1ms1GbpsCase 10
129350msCase 9
135.45msCase 8
82.930.1ms100Mbps
0.001Case 7
395.750msCase 6
39.65msCase 5
8.250.1ms1GbpsCase 4
396.550msCase 3
89.455msCase 2
82.250.1ms100 Mb/s0.0001Case 1
Round-trip delayBottleneck link ratePacket loss rate
Mean transfer delay for a 1GB file (s)
Input parametersCase
~21Mbps
~2Mbps
69
How is bandwidth shared on a circuit- switched network?
• The signaling procedure described is for immediate-request calls
• Example: telephone networks• Send a call setup request:
– if requested bandwidth is available, it is allocated to the call
– if not, the call is blocked (rejected)
• M/G/m/m model: – m: number of circuits
70
ErlangB formula
mP
u
k
mP
bb
m
k
k
m
b
)1(
!/
!/
0
: offered traffic load in Erlangs: call arrival rate
1/: mean call holding timem: number of circuitsPb: call blocking probabilityub: utilization
For a 1% call blocking probability, i.e., Pb = 0.01
m ua
24.8%58.2%84.6%
110100
417117
71
Delay model - to compare with TCP approach
• What happens after the call is blocked?• If user waits and tries again, then the
call does not simply go away• A better model would be an M/M/m/
queueing system– approximate, since "queueing" is distributed
at the end hosts, which have no idea when to try again
– probability of an arriving call finding all m circuits busy is much higher than in call blocking model since calls linger
72
Impact of increasing m at different values of link utilization Ud
100
101
102
103
0
0.2
0.4
0.6
0.8
1
Ud=90%
Ud=80%
Ud=60%
Ud=40%
m
PQ
100
101
102
1030
200
400
600
800
1000
100
101
102
1030
200
400
600
800
1000
100
101
102
1030
200
400
600
800
1000
100
101
102
1030
200
400
600
800
1000
Ud=90%
Ud=80%
Ud=60%
Ud=40%
m=10
Pq=41%
Pro
b.
of a
rriv
ing
job
findi
ngal
l m c
ircui
ts b
usy
Off
ered
load
: ca
ll ar
rival
rat
e/ca
ll d
epar
ture
rat
e
Low-rate per-call circuits High-rate per-call circuits
Link capacity expressed in channels
73
Impact of mean call holding time,
/1
0 5 10 15 20 25 3010
0
101
102
103
104
105
1/ (minutes)
N
0 5 10 15 20 25 300
6
12
18
24
30
0 5 10 15 20 25 300
6
12
18
24
30
0 5 10 15 20 25 300
6
12
18
24
30
0 5 10 15 20 25 300
6
12
18
24
30
0 5 10 15 20 25 300
6
12
18
24
30
0 5 10 15 20 25 300
6
12
18
24
30
E[W
d] (m
inute
s)
m=1000,=1call/hour
m=100,=1call/hour
m=10,=1call/hour
m=10,=10calls/hour
m=10
m=100m=1000
Num
ber
of
port
s ag
gre
gatin
g tr
affic
on t
o th
e li
nk
Mea
n w
aiti
ng
time
for
dela
yed
calls
: per host call-generation rate Ud: 90% )1(
1][
dd Um
WE
'
74
BW sharing modes in circuit/VC networks
• Mean waiting time is proportional to mean call holding time• Can afford to have a queueing based solution when m is
small if calls are short
Large m Moderate throughput Small m
Short calls Long callsBank teller Doctor's office
High throughputimmediate-requestwith call blocking + retries("call queueing")(video, gaming)
immediate-requestwith delayed-start times("call queueing") (file transfers)
book-ahead
m is the link capacityexpressed in channelse.g., if 1Gbps circuitsare assigned on a 10Gbps link,m = 10
75
How is bandwidth shared on a virtual-circuit network?
• In connection-oriented packet-switched networks,– bandwidth allocation to a virtual circuit
is independent of label selection
• In circuit-switched networks,– when "labels" are selected (e.g.,
timeslots are selected on a SONET link), it means bandwidth allocation to the circuit is immediately fixed
76
Savings in bandwidth allocation over circuit-
switched networks
CL
peak bandwidth assignment
average bandwidth assignment
QoS specified bandwidth assignment
Nadmissible regionNumber of sources
Bandwidthrequired
pL NRC
Mischa Schwartz's 1996 textbook on broadband networks
77
Bandwidth allocation for virtual circuits
• How is bandwidth allocated to a virtual circuit?– Call setup request carries
• traffic descriptor parameters• desired quality-of-service parameters
– Call admission control algorithm is executed at the switch controller to determine• bandwidth allocation for the virtual circuit• buffer space allocation for the virtual circuit
78
Traffic descriptors
• Peak rate• Sustained rate (average)• Mean Burst Size
79
QoS measures
• Packet Loss Ratio• Packet Transfer Delay• Packet Delay Variance
80
Traffic source model
• On-off Markov model to characterize the traffic source: fluid flow model
OFF ON
Rp
timemean:1/
mean:1/
ON OFFOFF OFFON
probability that the source is the ON state: )/( p
81
Traffic descriptors values for ON-OFF model
• Peak rate = Rp
• Sustained rate (average) = pRp
• Mean Burst Size = Rp/
82
N sources instead of one source
• To compute bandwidth allocation, we set up the problem assuming N homogeneous independent sources, each of which can be represented by the same ON-OFF model (with the same parameter values)
1
NCL
buffer length = x
83
"Equivalent bandwidth"
• Two approximations (both conservative):– Stationary approximation (buffer is ignored)– Flow approximation (statistical multiplexing is
ignored)
• Seminal paper by Guerin, Ahmadi and Naghshineh, JSAC 1991
),min( fs CCC
84
Flow approximation
• x: buffer size
• PL: packet loss ratio
• Rp: peak rate
• p: probability of source being in ON state
• N: number of sources
• 1/: mean ON-state duration
)/1ln()1(
21
21 2
Lp
pL
f
PpRx
k
kpkk
RNC
C
85
Stationary approximation
pS
pS
RKmC
mRC
)(
QoS-specified allocation:
peak bandwidth assignment: pNRC
ps RmC ln2)2ln(
)(
p
)1(2 pNp
more than average
pNm
• m: average number of ON sources 2: variance of the number of ON sources : probability of being in the overload region
• Use binomial distribution to find m and 2
LP
86
Example
• x: buffer size = 3 Mbits
• PL: cell loss ratio = 10-5
• Rp: peak rate = 4 Mbps
• p: ON-state probability = 0.35
• CL: number of sources = 400Mbps
• 1/: mean ON-state duration = 100msec
• k = 0.65/(1-p) = 1
36.259.0 pL
f RNC
C Mbps
Therefore number of calls that can be admitted is:
16936.2
LC
N instead of 100 (peak-rate allocation)
link capacity
87
Need data-plane algorithms to achieve QoS guarantees
Call Admission Control
Scheduling(example: weighted fair queueing)
Traffic shaping/policing(example: leaky-bucket algorithm)
88
Outline
• Principles– Different types of connection-oriented
networksTechnologies
– Single network– Internetworking
• Usage– Commercial networks– Research & Education Networks (REN)
89
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TE• OSPF-TE• LMP
• Internetworking– GFP, VCAT, LCAS for SONET/SDH– PWE3 for MPLS networks– Digital wrapper for OTN
MPLS Architecture
Label Switched Path (LSP) Label Ingress Router (LIR)
Entry point into MPLS network: isolating packets to map to LSPLabel Egress Router (LER)
● Exit point, removes label and routes based on native format
Label Switched Router (LSR)Routers along the path that examine the top label in stack and forward accordingly
1: Label Switched Path (LSP): Term used for virtual circuits in MPLS networks
MPLS header
• Label Value– (20 bits) Label used to identify the virtual circuit
• Class of Service (CoS)– (3 bits) Experimental field, Used for QoS support
• S– (1 bit) Identifies the bottom of the label stack
• TTL– (8 bits) Time-To-Live value
MPLS Header
Label Value CoS S TTL20 Bits 3 1 8
MPLS label stacking
• MPLS labels can be stacked• What does this mean?
– Create one virtual circuit (VC) on a link– Say we allocate 100Mbps to this VC– We can create another VC within this VC and allocate it a portion of
this 100Mbps
• Why is label stacking required?– Expected to be required originally for scalability– Most vendors support at least 4 levels (Malis' paper)– Currently, this has become a useful feature for pseudo-wire services
(point-to-point services) and VPNs (multipoint)
MPLS header MPLS header...
Andy Malis paper in IEEE Comm. Mag., Sept. 2006
PoS IP ...
Eth IP ...
Eth IP ...
MPLS
PoS MPLS MPLS IP ...
PoS MPLS IP ...
MPLS Label Stacking: hierarchical packet forwarding
Label for DC to Sunnyvale LSP Label for St. Louis to Phoenix LSPLabel pushed on stack at Chicago LSR to route to Denver
IEEE 802.1Q Ethernet VLAN
Dest. MAC Address
TPID TCI Type/Len
Data FCS
VLAN Tag
2 Bytes
802.1Q Tag Type CFIUser Priority
VLAN ID
3 Bits 1 Bit 12 Bits
Source MAC Address
FCS: FrameCheckSequence
new fields
VLAN Tag Fields• Tag Protocol Identifier (TPID)
– (2 bytes) 802.1Q Tag Protocol Type – set to 0x8100 to identify the frame as a tagged frame
• Tag Control Information (TCI)– User Priority
• (3 bits) As defined in 802.1p, 3 bits represent eight priority levels
– CFI• (1 bit) Canonical Format Indicator, set to indicate the
presence of an Embedded-RIF
– VLAN ID• (12 bits) VID uniquely identifies the frame's VLAN
96
Integrated services (Intserv) IP network
• "Label" on which switch performs its forwarding function:– Destination IP address– Source IP address– Protocol field in IP header: TCP or UDP– Destination TCP or UDP port number– Source TCP or UDP port number
97
SONET Equipment• By Functionality
– Add/Drop Multiplexers (ADMs): dropping & inserting tributaries
– Regenerators: digital signal regeneration– Cross-Connects: interconnecting SONET streams
• By Signaling between elements– Section Terminating Equipment (STE): span of fiber
between adjacent devices, e.g. regenerators– Line Terminating Equipment (LTE): span between
adjacent multiplexers, encompasses multiple sections – Path Terminating Equipment (PTE): span between SONET
terminals at end of network, encompasses multiple lines
Courtesy: Leon-Garcia and Widjaja's textbook
98
Section, Line, & Path in SONET
PTELTE
STE
STS-1 Path
STS Line
Section Section
STE = Section Terminating Equipment, e.g., a repeater/regeneratorLTE = Line Terminating Equipment, e.g., a multiplexerPTE = Path Terminating Equipment, e.g., a crossconnect
MUX MUXReg Reg Reg
SONET terminal
STE STELTE
PTE
SONET terminal
Section Section
Courtesy: Leon-Garcia and Widjaja's textbook
99
Optical Optical Optical
Section
Optical
Line
Optical
Line
Optical
Line
Path
Optical
Line
Path
Section, Line, & Path Layers in SONET
• SONET has four layers– Optical, section, line, path– Each layer is concerned with the integrity of
its own signals
• Each layer has its own protocols– SONET provides signaling channels for
elements within a layerCourtesy: Leon-Garcia and Widjaja's textbook
Section Section Section SectionSectionSection
100
SONET STS Frame
• SONET streams carry two types of overhead
• Path overhead (POH): – inserted & removed at the ends– Synchronous Payload Envelope (SPE) consisting
of Data + POH traverses network as a single unit
• Transport Overhead (TOH): – processed at every SONET node– TOH occupies a portion of each SONET frame – TOH carries management & link integrity
information
Courtesy: Leon-Garcia and Widjaja's textbook
101
Special OH octets:
A1, A2 Frame SynchB1 Parity on Previous Frame (BER monitoring)J0 Section trace (Connection Alive?)H1, H2, H3 Pointer ActionK1, K2 Automatic Protection Switching
810 Octets per frame @ 8000 frames/sec
9 rows
90 columns
1
2Order of transmission
A1 A2 J0 J1
B1 E1 F1 B3
D1 D2 D3 C2
H1 H2 H3 G1
B2 K1 K2 F2
D4 D5 D6 H4
D7 D8 D9 Z3
D10 D11 D12 Z4
S1 M0/1 E2 N1
3 Columns of Transport OH
Section Overhead
Line Overhead
Synchronous Payload Envelope (SPE) 1 column of Path OH + 8 data columns
Path Overhead
Data
STS-1 Frame 810x64kbps=51.84 Mbps
Courtesy: Leon-Garcia and Widjaja's textbook
102
Optical transport networks
• ITU-T G.872 specifies an optical transport network (OTN) architecture, which defines two interface classes– Inter-domain interface (IrDI): interface
between operators/vendors; defined with 3R processing (retiming, reshaping, and regeneration)
– Intra-domain interface (IaDI): interface within an operator/vendor domain
• ITU-T G.709 is about the information transferred across IrDI and IaDI interfaces– Defines several layers in the OTN hierarchy
103
Objective and features• Need to support the transmission needs of today’s diverse
digital services on optical links• Need to equip DWDM equipment with operational,
administration, and maintenance functionalities, similar to those seen in SONET/SDH
• Advantages relative to SONET/SDH– Management of optical signals in the optical domain
• without O/E/O conversion
– Transparent transport of client signals– Stronger Forward Error Correction (FEC)
• G. 872 layers– OTS: Optical Transmission Section– OMS: Optical Multiplex Section– OCh: Optical Channel
104
Layers within an OTN
Courtesy: T. Walker's tutorial
105
OTN Hierarchy
• Electrical domain:– OTU: Optical Channel Transport Unit– ODU: Optical Channel Data Unit– OPU: Optical Channel Payload Unit
Courtesy: T. Walker's tutorial
Low layer
Higher layers
106
G. 709 Optical Channel frame structure (digital wrapper)
• Optical channel (OCh) overhead: support operations, administration, and maintenance functions
• OCh payload: can be STM-N, ATM, IP, Ethernet, GFP frames, OTN ODUk, etc.
• FEC: Reed-Solomon RS(255, 239) code recommended; roughly introduces a 6.7% overhead
• Frame size: 4 rows of 4080 bytes• Frame period:
– OTU1 – 48.971 μs (payload data rate: roughly 2.488 Gbps )– OTU2 – 12.191 μs (payload data rate: roughly 9.995 Gbps )– OTU3 – 3.035 μs (payload data rate: roughly 40.15 Gbps )
OCh overhead OCh payload FEC
107
References for OTN• ITU-T G. 872 and G.709/Y.1331 Specifications• T. Walker, “Optical Transport Network (OTN) Tutorial”,
Available online: http://www.itu.int/ITU-T/studygroups/com15/otn/OTNtutorial.pdf
• Agilent, “An overview of ITU-T G.709,” Application Note 1379
• P. Bonenfant and A. Rodriguez-Moral, "Optical Data Networking," IEEE Communications Magazine, Mar. 2000, pp. 63-70.
• E. L. Varma, S. Sankaranarayanan, G. Newsome, Z.-W. Lin, and H. Esptein, “Architecting the Services Optical Network,” IEEE Communications Magazine, Sept. 2001, pp. 80-87.
108
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: RSVP-TE: signaling protocol• OSPF-TE: routing protocol• LMP
• Internetworking– PWE for MPLS networks– GFP, VCAT, LCAS for SONET/SDH– Digital wrapper for OTN
109
The evolution ofResource reSerVation Protocol
(RSVP)• RSVP (RFC2205, 1997)• RSVP-TE (RFC 3209, 2001)• RSVP-TE GMPLS Extension (RFC
3471, 3473, 2003)• RSVP-TE GMPLS Extension for
SONET/SDH (RFC 3946, 2004, RFC 4606, 2006)
110
RSVP-RFC 2205• Designed to support integrated services on the Internet• Reserve resources to meet required QoS measures for a
data flow• Seven messages:
– Path, Resv, PathErr, ResvErr, PathTear, ResvTear, and ResvConf (trigged by an optional object, RESV_CONFIRM, in Resv messages)
• All messages begin with a common header, followed by a body consisting of a variable number of “objects”
• Common header format
Vers Flags
Msg Type RSVP Checksum
Send_TTL (Reserved)
RSVP Length
111
Path message
• Three mandatory objects– SESSION
• Carries the destination address of an LSP1
– RSVP_HOP• Used to identify the GMPLS neighbor node
(sender/receiver of signaling message) – TIME_VALUES
• Set refresh timer
• Optional objects– SENDER_TEMPLATE
• Carries the source address of an LSP– SENDER_TSPEC
• Carries traffic descriptor parameters (IntServ Tspec)
1: Label Switched Path (LSP): Term used for virtual circuits in MPLS networks
112
Key objects: destination and label
• Session object:– Carries the destination IP address– IP protocol type field (TCP or UDP)– Destination TCP or UDP port number
• Sender-template object – Carries the source IP address– Source TCP or UDP port number
Compare with principles slides for CO PS networks
113
Key objects: traffic descriptor and QoS metrics
• Sender Tspec: Traffic descriptor• AdSpec: QoS metrics
114
IntServ Tspec (RFC 2210)Object format:
115
RSVP-TE (RFC 3209)
• RSVP extensions to support MPLS• What is new?
– A new message, “Hello”, for node failure detection– Change of the Path message
• Updates of some objects– SESSION– SENDER_TSPEC– …
• A new mandatory object– LABEL_REQUEST
• Two optional objects become mandatory– SENDER_TEMPLATE – SENDER_TSPEC
• A new optional object– EXPLICIT_ROUTE object (ERO)
116
SESSION object• Original SESSION object (RFC 2205)
• New SESSION object (RFC 3209)
– IPv4 tunnel end point address: IP address of the egress node for the tunnel
– Tunnel ID: An ID that remains constant over the life of the tunnel– Extended Tunnel ID: Can be set to the IP address of the ingress
node to narrow the scope of the session to the ingress-egress pair
117
SENDER_TEMPLATE object
• Original format (RFC 2205)
• New format (RFC 3209)
118
LABEL_REQUEST object
• Three types– Without label range
– With an ATM label range
– With a Frame Relay label range
119
Explicit Route Object (ERO)
• A list of groups of nodes along the explicit route (generically called "source route")
• Thinking: source routing is better for calls than hop-by-hop routing (which is used for packet forwarding) as it can take into account loading conditions
• Constrained shortest path first (CSPF) algorithm executed at the first node to compute end-to-end route, which is included in the ERO
120
Source routing
• Many papers describe the call setup procedure as ingress node performing CSPF or Routing and Timeslot Allocation (RTA), and then sending an RSVP message with an ERO– contrast with the hop-by-hop
approach described in the Principles section
121
Source routing
• NYC trusts the "OC3 left" message from Chicago and routes the call to Chicago to reach SF
• This works if call arrival rate is low; if this rate is high and Chicago to SF calls could have used up the OC3 before the next update and the signaling request arrives in between, this will not work.
SF
ATL
ChicagoNYC
Routing updates from Chicago to NYC about its link to SF
OC12
OC3 left
Call setup Pathmessage from NYC to SF
0 bandwidth left
122
RSVP-TE extension for GMPLS (RFC 3471, 3473)
• RSVP-TE extension for Generalized MPLS• What is new?
– A new message, “Notify”, for supporting fast failure notification
– Update of objects• Generalized LABEL_REQUEST• Generalized LABEL
– Support labels to identify timeslots, wavelengths, etc– The label “class” is implicit in the multiplexing capability
of the link
– Interface ID field added to RSVP-HOP object, ERO– A new object
• UPSTREAM_LABEL – support bidirectional setup
123
Generalized LABEL_REQUEST object
– LSP Encoding Type: encoding of the LSP being requested
• e.g.: Ethernet, SONET, Digital Wrapper… - lowest layer
– Switching Type: type of multiplexing• e.g.: TDM, LSC, FSC …
– Generalized PID: identify the payload of the LSP - what is carried on the LSP
• e.g.: SONET/SDH for Lambda encoding DS1 /DS3 for SONET encoding
124
Need for Interface ID
• Separation of control plane from data plane in GMPLS networks - out-of-band
GMPLS NetworkGMPLS Network
InternetInternetIP router IP router
SONET or WDM switch
SONET or WDM switch
Data-plane link
Ethernet control portsEthernet control ports
Circuit established
Control-plane messages
125
Need for Interface ID
• Control plane separation: – Requires upstream switch to identify on which data-
plane interface the virtual circuit should be routed– Interface ID field defined in the tag-length-value
format• Identifier types:
– IPv4 or IPv6 address ("numbered" link)– Interface index ("unnumbered" link)
» Saves on IP addresses» Little need to allocate a separate address to each
interface of a SONET switch
– Embedded within the RSVP-HOP object
126
Unnumbered Links (RFC 3477)
• Unnumbered links: links that are not assigned IP addresses
• Two issues: – How to carry TE information about
unnumbered links in IGP TE extensions (covered by GMPLS-ISIS and GMPLS OSPF)?
– How to specify unnumbered links in GMPLS signaling?
• An unnumbered link has to be point-to-point• The switch at each end assigns a 32-bit ID to the
link• Unnumbered interface IDs (IF_IDs) are supported
in RSVP_HOP object and ERO, etc.
127
Unidirectional vs. Bidirectional
• In RFC 3209, to set up a bidirectional LSP, two unidirectional paths must be established independently
• UPSTREAM_LABEL object– Indicates the request of a bidirectional circuit– Same format as the LABEL object
• Why do we need this?– Reduce setup delay and control overhead– Avoid race conditions in resource assignment– Bidirectional optical LSPs are often required in optical
networking services (many vendors only support bidirectional setup)
128
RSVP-TE GMPLS extension for SONET/SDH (RFC 4606)
• Label and bandwidth parameters changed– A new LABEL format for SONET/SDH –
SUKLM– A new Tspec format for SONET/SDH Traffic
129
SONET_/SDH_Tspec
– Signaling type: the type of elementary signal• Eg: VT1.5, STS-1, STS-12…
– RCC: requested contiguous concatenation– NCC: number of contiguous components– NVC: number of virtual components– MT: number of identical signals requested
130
SONET/SDH LABEL - SUKLM• Five parameters but only some of them are significant
for different multiplexing schemes
(Use SONET as an example)– S=1->N: the index of a particular STS-3 inside an STS-N
multiplexed signal– U=1->3: the index of a particular STS-1_SPE within an STS-
3– K=1->3: for SDH only– L=1->7: the index of a particular VT_Group within an STS-
1_SPE– M: the index of a particular VT1.5/2/3_SPE
131
RSVP-TE signaling procedures
• Distribute bandwidth management functionality to each switch for its own interfaces
• 5 steps of circuit setup processing at each switch– Message parsing– Route determination– Connection admission control– Date-plane configuration– Message construction
132
RSVP-TE signaling procedures
• Data tables maintained at each switch– Routing table
• Simplest: next-hop node to reach destination• Precomputed after routing information is
collected by OSPF-TE– Connectivity table
• Data-plane interfaces and interface IDs• Control-plane address correlation
– CAC table• Available bandwidth for each data-plane
interface– State table
• Information about each live circuit or VC
133
Yes
Yes
YesFrom SENDER_TSPEC
Path message processing: main step
Search State table to check if session
exist
SENDER_TEMPLATE
Allocate bandwidth on data-plane interface outgoing to the next hop (CAC)
Yes (Refresh)
No
Route found
SESSION
No PathErr
Allocation successful
No PathErr
Update CAC table
DONE
Search Routing table for next hop assumes hop-by-hop routingof the call
134
Processing of Resv message: main step
Outgoing_Label(s) in accordance with
Outgoing_assigned_Timeslots
LABEL
Outgoing_label(s) <- Label(s)
From SESSION & FILTER_SPEC
Update Outgoing CAC table if necessary
Program switch fabric with Incoming/outgoing physical interface ID
and Incoming/outgoing labels
DONE
No ResvErr
Yes
From SESSION & FILTER_SPEC
135
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TEOSPF-TE• LMP
• Internetworking– GFP, VCAT, LCAS for SONET/SDH– PWE3 for MPLS networks – Digital wrapper for OTN
136
OSPF-TE
• OSPF-TE adds more attributes to links in OSPF link state advertisements (LSA).
• These LSAs are distributed in a given OSPF area.
• Routers build an extended link database based on these LSAs that can be used to– Monitor the link attributes.– Perform local constraint-based source
routing.– Global traffic engineering.
137
Purpose of OSPF-TE
• To advertise loading conditions• RFC 3630 - for MPLS networks
138
OSPF-TE LSA
Link-state age Options Type
Link-state ID
Advertising Router
Link-state sequence number
Link-state checksum Length
Common LSA header0 31
LSA Payload(variable)
139
TE-LSA Header• Link-state age: time since this LSA generation.• Options: optional functionality supported by the router.• Type: OSPF opaque LSA (=10) with area flooding scope.• Link-state ID: 1 in the first octet followed by an
Instance field in the remaining 3 octets.• Advertising router: router ID of the router generating
this LSA.• Link-state sequence number: Identifies a unique LSA
to detect losses and duplicates.• Checksum: Covering all except the age field.• Length: in bytes of the LSA including the LSA header.
140
TLV
• The TE-LSA payload carries one or more nested Type/Length/Value (TLV) triplets
Type Length
Value(variable)
• Type: either Router Address TLV (=1) or Link TLV (=2)• Length: length of the value field in octets
0 31TLV format
141
TLV
• Router Address TLV– Router ID of the advertising router– Router ID is a loopback address that can be
reached via any interface (typically used in routing protocols instead of a specific interface IP address to avoid loss of reachability to the router if the interface fails)
– The value field contains this IP address.– It must appear in exactly one TE-LSA from a
router– Purpose: assume it is to identify that the
router is a TE-capable router
142
TLV
• Link TLV– It describes attributes of a single link.– It is composed of a set of sub-TLVs.– Each TE-LSA carries only one link TLV.
143
Sub-TLVs• Contained in the value field of a Link TLV• Multiple types of sub-TLVs are defined. Some of them are
– Link type: Point-to-point or Multi-access.– Link id: identifies the other end of the link.– Local interface IP address– Remote interface IP address– Traffic engineering metric: typically assigned by the
administrator and could be different from the OSPF link metric– Maximum bandwidth: maximum bandwidth that can be used. – Maximum reservable bandwidth: can be greater than the
maximum bandwidth to support oversubscription– Unreserved bandwidth– Administrative group (4-byte mask: 1 bit per admin group for
link)• Bandwidth fields (in bytes) are expressed in IEEE floating
point format
144
Link type
• Link type: point-to-point or multi-access
• Link ID: identifies the other end of the link as in a Router LSA– point-to-point links: Router ID of the
neighbor– multi-access links: interface address
of the designated router
145
OSPF-TE extensions for GMPLS (RFC 4202 and 4203)
• New sub-TLVs for the Link TLV– Link Local/Remote Identifiers– Link Protection Type– Shared Risk Link Group– Interface Switching Capability
Descriptor (ISCD)• main extension since GMPLS allows
multiple types of switching techniques
146
New sub-TLVs
• Link Local/Remote Identifiers– Since GMPLS added interface IDs for
unnumbered links (i.e., links that are not assigned IP addresses), this sub-TLV carries those identifiers
• Link protection type: Extra, Shared, dedicated 1:1, dedicated 1+1, unprotected, enhanced
147
Shared risk link group (SRLG) • SRLG: set of links that share a resource whose failure
may affect all links in the set.– Example, two fibers in the same conduit would be in the
same SRLG.
• SRLG sub-TLV for a link is an unordered list of SRLGs that the link belongs to. This could be more than 1.
• SRLG is identified by a 32 bit number that is unique within an IGP domain.
148
Interface Switching Capability Descriptor (ISCD)
ISCD format
Switching cap Encoding Reserved
Max LSP Bandwidth at priority 0
Max LSP Bandwidth at priority 1...
Max LSP Bandwidth at priority 7
Switching Capability specific information(variable)
0 31
149
Interface Switching Capability Descriptor (ISCD)
• It describes the switching capability of the link• Switching capability can be
– Packet switch capable (PSC)– Layer-2 switch capable (L2SC)– Time-division-multiplex switch capable (TDM)– Lambda-switch capable (LSC)– Fiber-switch capable (FSC)
• Encoding: Same as LSP encoding in Generalized label request object of RSVP-TE - see RFC 3471
150
Interface Switching Capability Descriptor (ISCD)
• The maximum LSP bandwidth at priority p: the smaller of the unreserved bandwidth at priority p and a "Maximum LSP Size" parameter which is locally configured on the link, and whose default value is equal to the max link bandwidth.
151
ISCD Specific Information• No ISCD specific information for L2SC, and LSC.• When the switching capability is PSC, the
following fields are generated
Minimum LSP bandwidth
PaddingInterface MTU
• Padding is used to make the ISCD 32-bits aligned.
0 31ISCD specific information for PSC
152
ISCD Specific Information
• For TDM switching capability, the following fields are generated
Minimum LSP bandwidth
PaddingIndication
• Minimum LSP Bandwidth example: OC1 on a SONET interface if the switch demultiplexes down to OC1 level
• The indication field takes a binary value stating whether the interface supports standard or arbitrary SONET/SDH
• Optionally, how many time-slots are free on a TDM link can be incorporated in the ISCD specific information field– 32 bit tuple: <signal_type(8 bits), number of unallocated
timeslots(24 bits)>
0 31ISCD specific information for TDM
153
References for OSPF-TE• RFC 2702 - Requirements for Traffic Engineering Over MPLS:
http://www.faqs.org/rfcs/rfc2702.html• RFC 3630 - Traffic Engineering (TE) Extensions to OSPF Version 2:
http://www.faqs.org/rfcs/rfc3630.html• RFC 4203 - OSPF Extensions in Support of Generalized Multi-Protocol
Label Switching (GMPLS) : http://www.ietf.org/rfc/rfc4203.txt• RFC 2328 - OSPF Version 2 : http://www.ietf.org/rfc/rfc2328.txt• OSPFv2 Routing Protocols Extensions for ASON Routing:
http://www.ietf.org/internet-drafts/draft-ietf-ccamp-gmpls-ason-routing-ospf-02.txt
• RFC 4202 - Routing Extensions in Support of Generalized Multi-Protocol Label Switching (GMPLS): http://www.ietf.org/rfc/rfc4202.txt
• RFC 3471- Generalized Multi-Protocol Label Switching (GMPLS) Signaling Functional Description: http://www.faqs.org/rfcs/rfc3471.html
• Dimitri Papadimitriou, IETFInternet Draft, "OSPFv2 Routing Protocols Extensions for ASON Routing," draft-ietf-ccamp-gmpls-ason-routing-ospf-02.txt, October 2006.
154
Difference between labels in MPLS and circuit-switched
GMPLS• In circuit-switched GMPLS networks, labels are
not carried in the data plane – Labels in circuit-switched networks identify "position"
of data for the circuit - time or wavelength
• In circuit-switched GMPLS networks, cannot assign labels without associated bandwidth reservation– In usage section, we will see the value of this feature
in MPLS networks– See two applications: traffic engineering, VPLS
(addressing benefits)
155
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TE• OSPF-TE LMP
• Internetworking– PWE3 for MPLS networks– GFP, VCAT, LCAS for SONET/SDH– Digital wrapper for OTN
156
LMP procedures
• Control channel management– Set up and maintain control channels
between adjacent nodes• Link property correlation
– Aggregate multiple data links into a TE link– Synchronize TE link properties at both ends
• Link connectivity verification (optional)– Data plane discovery; If_Id exchange;
physical connectivity verification• Fault management (optional)
– Fault notification and localization
Reference: IETF RFC 4204
157
What is a control channel?
• A control channel is a pair of mutually reachable interfaces that are used to enable communication between nodes for routing, signaling, and link management.
• Obvious question: bootstrap issue– how do you exchange messages on a
control channel to create a control channel?
158
Types of control channels
• LMP does not specify the exact implementation of the control channel
• Examples of control channels:– a separate wavelength or fiber – an Ethernet link– an IP tunnel through a separate
management network (e.g., Internet)– the overhead bytes of a data link (e.g.,
DCC)
159
Control channel identifier (CC_Id)
• A number from the space in which unnumbered interface IDs are assigned by a node– a 32-bit integer unique to the node
• Assign IP addresses to control channel ends– Because LMP runs over UDP/IP (UDP port
number: 701)– remote end IP address: manually configured
or automatically discovered
160
Automatic discovery
• How does a node automatically discover the IP address assigned to remote end of one of its control channels:– Config message sent:
• source IP address: unicast address• destination IP address: multicast 224.0.0.1• Config ACK message returned with destination IP
address
– Used when control channel is a DCC channel within a data link
161
Control channel management• Config, ConfigAck, ConfigNack messages
– Specify • Control_Channel_ID• Node_ID (Router ID used in routing protocols)• Hello protocol parameters (hello interval and dead
interval)• Message_ID - just for ARQ support for these LMP message
exchanges – process used in RSVP too because RSVP runs on IP
• Hello messages - a lightweight keep-alive mechanism– Used to maintain control channel connectivity and
detect control channel failure
• Multiple control channels allowed– Useful in case of control channel failure
162
Link property correlation
• Message LinkSummary– Summarizes TE link information (data-plane
interfaces); Indicates support for fault management and link verification procedures
• Message LinkSummaryAck– Signals agreement on message LinkSummary
• Message LinkSummaryNack– Indicates disagreement; may suggest alternative
values for negotiable parameters– Example: if one end of a TE-link is assigned an IPv4
address and the other end is assigned an IPv6 or unnumbered interface ID, there is a problem
163
Link connectivity verification (optional)
• Obj: Verify physical connectivity of data links and dynamically learn the TE link and interface ID associations.– A node must be able to send message over any data link
• Procedure– Exchange of a pair of BeginVerify and BeginVerifyACk message
over a control channel– Upstream node sends Test messages with local If_Id on a data
link – Downstream node replies with TestStatusSuccess or
TestStatusFailure accordingly over the control channel– If TestStatusSuccess, upstream node records the mapping of local
If_Id and remote If_Id, marks the link as “Up”, and then follows up with a TestStatusAck message for acknowledgement. If TestStatusFailure, marks the link as “Failed”.
– Use EndVerify message to complete the procedure when all data links are tested.
164
Fault management (optional)
• For failure notification and localization only– Assume fault detection done at lower layer, e.g., loss of
light observed at physical layer
• Purpose of procedure:– "To avoid multiple alarms stemming from the same
failure, LMP provides failure notification through the ChannelStatus message"
Reference: IETF RFC 4204
165
Fault management procedure
Node 1 Node 2 Node 3 Node 4 Node 5
abc
A failure occurs between Nodes 2 and 3:
a. Node 3 (downstream node) will detect the failure and send a ChannelStatus message to node 2 indicating the failure.
b. Node 2 will immediately acknowledge this message by returning a ChannelStatusAck message.
c. Node 2 will then correlate the message to see if the failure is also detected locally
d. If there is no problem on the input side to Node 2 and within Node 2, it means the failure is localized
e. Node 2 then sends a ChannelStatus message to node 3 indicating that the failure has been localized and that the link is either failed or OK
• Presumably, if there was a protection path, Node 2 could quickly restore the channel and send an OK status.
166
Control-plane security
• Need authentication and integrity for all control-plane exchanges
• Since RSVP, OSPF, LMP run over IP, IPsec is a possible solution
167
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TE• OSPF-TE• LMP
Internetworking– GFP, VCAT, LCAS for SONET/SDH– PWE3 for MPLS networks– Digital wrapper for OTN
168
Why internetworking?
• GMPLS networks do not exist as standalone entities
• Instead they are part of the Internet:– Obvious usage: to interconnect IP routers– Newer uses:
• Commercial: interconnect Ethernet switches in geographically distributed LANs via point-to-point links or VPNs
• Research & Education networks: connect GbE and 10GbE cards on cluster computers and storage devices to GMPLS networks
169
Obvious usage
• Router-to-router circuits and virtual circuits
GMPLS NetworkGMPLS Network
InternetInternetIP router IP router
SONET or WDM switch
SONET or WDM switch
170
Router-to-router usage
• OSPF-enabled usage – simply treat MPLS virtual circuit or
GMPLS circuit as a link between routers– allow routing protocol to include these
in routing table computations
• Data-plane– IP over MPLS– IP over PPP over SONET
• Packet-over-SONET (PoS)
Label Switched Path (LSP) from DC to Sunnyvale, CA
PoS IP ...
Eth IP ...
Eth. IP ...
MPLS
IP over MPLS
172
Newer uses
• Ethernet over MPLS/GMPLS VC/circuits: – port mapped– VLAN mapped
173
Ethernet port mapped over MPLS
• Send all Ethernet frames received on ports I and II on to the MPLS virtual circuit
• MPLS virtual circuit: Pseudo-wire• Enterprise can allocate IP addresses from one subnet: Virtual private LAN• Explains one use for MPLS virtual circuits with no bandwith allocation
InternetInternetIP router/MPLS switch IP router/MPLS switch
Ethernet switch
Enterprise 1 Enterprise 2
Ethernet switch
MPLS virtualcircuit
I II
SDM-to-MPLS gatewayPseudowire
SDM: Space Division Multiplexing
Gateway: interfaces have different MUX schemesunlike switch ("my definition")
SDM-to-MPLS gateway
Mux scheme on this link: Ethernet
174
Ethernet VLAN mapped over MPLS
• Extract frames carrying a specific VLAN ID tag on Ethernet ports I and II and map only these frames on to the MPLS virtual circuit
InternetInternetIP router/MPLS switch IP router/MPLS switch
Ethernet switch
Enterprise 1 Enterprise 2
Ethernet switch
MPLS virtualcircuit
I II
VLAN-to-MPLS gatewayVLAN-to-MPLS gateway
175
Ethernet port or VLAN mapped
over GMPLS circuits
• Send all frames or frames matching a given VLAN ID tag from Ethernet ports I and II on to the SONET/SDH/WDM circuit
• SONET/SDH/WDM switches now have Fast Ethernet/GbE/10GbE interfaces in addition to SONET/SDM or WDM interfaces
Ethernet switch
Enterprise 1 Enterprise 2
Ethernet switch
SONET/SDH/WDMcircuit
I II
SONET or WDM switch SONET or WDM switch
SDM-to-SONET/WDM gatewaySDM-to-SONET/WDM gateway
176
Commercial services
• EPL: Ethernet private line: map an Ethernet port to a SONET/SDH circuit
• Fractional-EPL: Map a GbE port to a lower-rate SONET circuit– Pause frames received from switch to
client node on the other side of the GbE
• V-EPL: Lower-rate VLAN mapped to an equivalent rate SONET circuit
page 110 of GFP section reference: SONET focused
177
REN application
Computer clusterComputer cluster
Diskarray
LCD panel
• Cluster computers, disk arrays, visualization clusters have GbE/10GbE interfaces
• Network: SONET/SDH/WDM or MPLS, for rate-guaranteed service
178
Technology
• So what technologies are required for this type of internetworking:– mapping Ethernet frames on to
MPLS/GMPLS virtual circuit/circuit mapping?
179
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TE• OSPF-TE• LMP
• Internetworking GFP, VCAT, LCAS for SONET/SDH– PWE for MPLS networks– Digital wrapper for OTN
180
Reference
• IEEE Communications Magazine, May 2002, Special issue on "Generic Framing Procedure (GFP) and Data over SONET/SDH and OTN," Guest Editors, Tim Armstrong and Steven S. Gorshe
• 6 excellent papers
181
What is GFP?
• Generic Framing Procedure (GFP) is a mechanism to transport packet-based data streams or block-oriented data streams over a synchronous communications channel, such as SONET/SDH
• My classification: It is a data-link layer protocol
182
Protocol stacks for various data transport applications
IP, IPX, MPLS, etc
RPP
SANs
Fib
er C
hann
el
ES
CO
N
FIC
ON
HDLC
ATM GFP
SONET/SDH
OTN
WDM
dark fiber
Ethernet
PPP
page 97 of reference
GFP
183
Why do we need GFP?
• Why do we need yet another data-link layer protocol?– More specifically, to transport data
packets over synchronous links?
184
Main reason• The framing techniques used in other data-link layer
protocols have problems• For example, IP packets are carried over SONET using
PPP/HDLC frames (called PoS)– HDLC inserts idle frames because SONET is synchronous it
needs a constant flow of frames to avoid losing synchronization
• But, there is a problem:– HDLC uses flags for frame delineation. The issue with this
framing technique is that if the flag pattern occurs in the payload, an escape byte has to be inserted
– This causes an increase in the required bandwidth– The amount of increase is payload-dependent
page 98 of reference
185
Other framing techniques• HEC - Header Error Control
– this is the CRC framing technique used in ATM– "A header CRC hunting mechanism is employed by the
receiver to extract the ATM cells from the bit/byte synchronous stream. The HEC location is fixed and ATM cell length is fixed. Starting from the assumed cell boundary, the ATM receiver compares its computed HEC value for the assumed ATM cell header against the HEC value indicated by the assumed HEC field. Cell stream delineation is declared after positive validations of the incoming HEC fields of a few consecutive ATM cells."
• ATM cells are fixed in length, but Ethernet frames are variable-length
• Therefore, we need a length field in order to implement this HEC-based frame delineation mechanism
pages 96-97 of reference
186
Main features of the GFP protocol
• Common aspects:– HEC + Length based delineation
• Core header has payload length and HEC
– Error control: error detection• Payload type HEC, payload Frame Check Sequence (CRC-
32)
– Multiplexing: linear and ring extension headers– Idle frames are sent to maintain synchronization as in
HDLC– Scrambling as in ATM:
• core header + payload scrambling
– Client management - client fail signal
• Client-dependent aspects:– Client-specific encapsulation techniques
page 68 of reference
187
GFP frame types
• CDFs: client data.• CMFs: information associated with the management of the
client signal or GFP connection• Idle frames: 4-byte GFP control frames• OA&M frames: operations, administration, and maintenance
GFP frames
Client frames Control frames
Client dataframes (CDFs)
Client managementframes (CMFs)
Idle frames OA&M frames
page 65 of reference
188
GFP frame structure
PTI PFI EXIUPI
0x00(0xB6)0x00(0xAB)0x00(0x31)0x00(0xE0)
Core header
Payload
Area
Payload length MSB
Payload length LSB
Core HEC MSB
Core HEC LSB
CIDSpare
Extension HEC MSB
Extension HEC LSBPayload header
Payload information
N [536,550]
or
variable length packets
Payload FCS
Payload type MSB
Payload type LSB
Type HEC MSB
Type HEC LSB0-60 bytes of
extension headers
(optional)Payload FCS
MSBPayload FCS
Payload FCS
Payload FCS LSB
Client data frames
Idle frame (scrambled)
Client control frames
Linear extension Header shown
(others may apply)
Bit transmission orderByte
transmission order
page 66 of reference
189
GFP core header
• Payload length indicator (PLI): 2 bytes – the size of the payload area in bytes– allows GFP frame delineation independent of the
content of higher-layer PDUs
• Core HEC (cHEC): 2 bytes– CRC16 to enable delineation
Hunt Sync
Presync
Correct cHEC for N frames
Incorrect cHEC
Incorect cHEC for M frames
Correct cHEC
page 68 of reference
190
GFP payload area
• Payload header: 4-64 bytes– Payload type: mandatory field; 2 bytes; the content
and format of the payload• Payload type identifier (PTI): 3 bits; the type of GFP
client frames (CDF or CMF)• Payload FCS Indicator (PFI): 1 bit; the presence of the
payload FCS field • Extension Header Identifier (EXI): 4 bits; the type of
extension header GFP (e.g., linear extension header)• User Payload Identifier (UPI): 8 bits; the type of payload
– Type Hec (tHEC): 2 bytes; CRC-16 to protect the payload type field
191
GFP payload area
• Payload header: – Extension headers: 0-60 bytes (optional)
• Null Extension Header: 0 bytes; by default• Linear Extension Header: 2 bytes; multi-access
link – Channel ID (CID): 1 byte; like MPLS label or VLAN ID
for multiplexing– Spare field: 1 byte
• Ring Extension Header: sharing of the GFP payload across multiple clients in a ring configuration
– Extension HEC (eHEC): mandatory; 2 byte; CRC-16 to protect the extension header
192
GFP payload area
• Payload information field: 0 to (65535 - X) bytes, where X is the length of payload header and payload FCS;
• Payload Frame Check Sequence (FCS): optional (indicated by PFI); 4 bytes; CRC-32 to protect payload information field
193
GFP's location in protocol stack
Ethernet IP/PPP Other client signals
GFP-Client specific aspects (payload-dependent)
GFP-Common aspects (payload-independent)
SONET/SDH path OTN OCh path
194
Main features of the GFP protocol revisited
• Common aspects:– HEC + Length based delineation
• Core header has payload length and HEC
– Error control: error detection• Payload type HEC, payload Frame Check Sequence (CRC-
32)
– Multiplexing: linear and ring extension headers– Idle frames are sent to maintain synchronization as in
HDLC Scrambling as in ATM:
• core header + payload scrambling
– Client management - client fail signal
• Client-dependent aspects:– Client-specific encapsulation techniques
page 68 of reference
195
Need for scrambling
• Line coding used in SONET/SDH and OTN optical communication links is NRZ (Non-Return to Zero)– Laser is turned ON if bit is 1, and OFF if bit is 0
• Advantages of NRZ: simplicity and bandwidth-efficiency
• Disadvantage: loss of synchronization possible at the receiver by the clock and data recovery circuits if there are many consecutive 0 bits in the data stream– could be caused by a malicious user sending such a
payload
page 92 of reference
196
• Self-synchronous payload scrambler – Use a polynomial of x43+1: XOR bit with scrambler output
bit that preceeded it by 43 bits– Drawback: error multiplication
Scrambling solution
page 93 of reference
+
D1D2Dn …
Data in
Data out
xn + 1 scrambler
Data out
DnD2D1 …
+Data in
xn + 1 descrambler
123410121516 xxxxxxxx
• Solution to error multiplication– Select a CRC generator polynomial with triple error detection
capability and have no common factor with scrambler
197
GFP client-based aspects
• Frame-mapped GFP (GFP-F)– 1-to-1 mapping: one client frame is mapped
into one GFP frame – Applicable to most packet data types, e.g.,
Ethernet MAC frames, IP packets
• Transparent-mapped GFP (GFP-T)– Many-to-1 mapping: a fixed # of client
characters are mapped into a GFP frame of predetermined length
– Applicable to 8B/10B block-coded client signals such as fiber channel, GbE (1Gb/s Ethernet)
Page 65 of reference
198
Client PDU (PPP, IP, Ethernet, RPR, etc)
0-65,531 bytes
GFP-F frame
PLI
2 bytes
cHEC
2 bytes
FCS (optional)
4 bytes
Payload header
4 bytesGFP
headerGFP
payloadGFP FCS
199
GFP-T frame
PLI
2 bytes
cHEC
2 bytes
FCS (optional)
4 bytes
Payload header
4 bytesGFP
headerGFP
payloadGFP FCS
#1 #N8x64B/65B + 16 superblock bits
64B/65B #164B/65B #2
64B/65B #764B/65B #8
CRC-16F1 F8…
…67 bytes
superblock
2 bytes1 bytes flag
8 64/65 B block (minus flag)
1 CCICCL#1
0 CCICCL#nDCI #1
DCI #(8-n)
……
n control codeword
8-n data codeword
8 8-byte block
LCC
LCC: last control character CCL: control code locatorCCI: control code indicator DCI: data character identifier
200
GFP-T encoding steps
1. Decode 8B/10B code words into original 8-bit values2. Map the eight decoded 8B/10B characters into a
64B/65B block code and set a flag bit to indicate if the block contains only data characters (DCI)
3. Create a superblock1. Group 8 64B/65B blocks2. Rearrange leading bits at end3. Generate and append CRC-16 check bits to form a
superblock4. Repeat creating at least N such superblocks
– N: minimum # of superblocks per GFP frame (e.g., 95 for GbE)
5. Prepend with GFP core and payload headers6. Scramble payload header and payload with x43+1
201
Comparing performance of GFP-F & GFP-T
• GFP-F– Efficient bandwidth utilization:
• only delivers client data frames (idle frames are removed)
• if client signal is lightly loaded, GFP-F can map this signal to a lower-rate circuit or GFP multiplex with other signals
– Higher latency: associated with buffering an entire client data frame at the ingress to the GFP mapper
Pages 89, 101 of reference
202
Comparing performance of GFP-F & GFP-T
• GFP-T– Advantage: transparent transport of 8B/10B
control characters as well as data characters
• minimum protocol awareness• a single hardware implementation can handle
many types of client signals (all that use 8B/10B coding)
– Lower bandwidth utilization: if client signal contains idle frames, these are transported through transparently
– Lower latency: only a few bytes of mapper/demapper latency
Pages 89, 101 of reference
203
Virtual Concatenation (VCAT)
• Allows for SONET/SDH rates in-between the rigid rates of the original hierarchy
• VT1.5-7v: means 7 virtually concatenated VT1.5 signals
• VCAT as an inverse multiplexing scheme– It allows for individual components of the virtually
concatenated signal to be routed along different paths before recombining them into a contiguous-bandwidth signal at the far endpoint
– Need to compensate for delays differences on the various paths used for the individual components
• Bandwidth partitioning – It allows for a SONET/SDH link to be partitioned into
arbitrary units of bandwidth
Pages 74, 107 of reference
204
VCAT increased bandwidth efficiency
Data signalSONET/SDH payload mapping
and bandwidth efficiency
SONET/SDH with VCAT payload mapping and bandwidth
efficiency
Ethernet
(10 Mb/s)STS-1/VC-3 – 21% VT1.5-7v/VC-11-7v – 89%
Fast Ethernet
(100 Mb/s)STS-3c/VC-4 – 67% VT1.5-64v/VC-11-64v – 98%
Gigabit Ethernet
(1000 Mb/s)STS-48c/VC-4-16c – 42%
STS-3c-7v/VC-4-7v –95%
STS-1-21v/VC-3-21V –98%
Page 75 of reference
205
Inverse multiplexing in VCAT
Page 82 of reference
Implementation of VCAT is only required at select nodes (i.e., the edge nodes); not all multiplexers need to support VCAT
206
Bandwidth partitioning with VCAT
Page 82 of reference
207
Link Capacity Adjustment Scheme (LCAS)
• LCAS is a mechanism to allow for automatic bandwidth tuning of a virtually concatenated signal– The VCAT group of circuits should already be
established using a• centralized NMS/EMS based procedure, or• by a distributed RSVP-TE based procedure
• Note that bandwidth cannot be increased beyond the aggregate value of the VCAT signal without a GMPLS RSVP or NMS/EMS procedure of circuit setup
208
Interaction between GMPLS RSVP and LCAS
Page 77 of reference
209
Link Capacity Adjustment Scheme (LCAS)
• LCAS is basically a synchronization procedure between the two ends of a VCAT signal– Unlike GMPLS RSVP, it is NOT a bandwidth reservation and
circuit setup or release procedure
• LCAS procedures (triggered by GMPLS or NMS/EMS):– add or remove a member of a VCAT group– renumber the members in a VCAT group
• Messages are exchanged between the originating and terminating SONET/SDH nodes to execute these LCAS procedures– Add member (ChID, GID)– Remove member (ChID, GID)– Member status
• Messages are sent in the H4 byte for high-order VCAT
210
Hitless change
• Hitless capacity adjustment– Without causing an errors during the process– "Two ends of the link must agree precisely when the
VCAT group transitions to a new payload in which new members have been added or some previous members removed"
– "Needs hardware-level synchronization as to when the SONET/SDH mappers should begin/stop inserting/extracting a payload from a VCAT group member"
• The link capacity adjustment does not impact user traffic flow (what if that is the bottleneck link for a TCP session?)
Pages 75 and 82 of reference
211
Applications of LCAS
• Adjusting bandwidth requirements on a time-of-day basis– A GbE signal may only require on average a 200-
300Mbps SONET circuit– Establish an STS-1-7v (388.688Mbps) VCAT circuit– Then add/delete members as load increases or
decreases– Need buffering and PAUSE signals to handle bursts– Can map two different GbE signals to one VCAT
group with different sets of members?
• Rerouting of traffic after failures
212
Data over SONET/SDH (DoS)
• Using GFP, VCAT, & LCAS, DoS provides a set of mechanisms for efficient transport of data packets on SONET/SDH circuits– GFP: an efficient and standard data link
layer protocol– VCAT: flexible bandwidth assignment
scheme requiring no modification to intermediate nodes
– LCAS: dynamic bandwidth adjustment of VCAT signal
213
Technologies
• Connection-oriented (CO) networks– Data-(user-) plane protocols
• packet-switched: MPLS, VLAN Ethernet, Intserv IP• circuit-switched: SONET/SDH, WDM, SDM
– Control-plane protocols: • RSVP-TE• OSPF-TE• LMP
• Internetworking– GFP, VCAT, LCAS for SONET/SDH PWE3 for MPLS networks– Digital wrapper for OTN
Pseudo Wire Emulation• Pseudo Wire Emulation Edge-to-Edge
(PWE3) is a mechanism for emulating certain services across a packet-switched network:– Services: Frame-relay, ATM, Ethernet, TDM
services, such as SONET/SDH– Packet-switched network:
• IP• MPLS
Example of a PWE3 service:Ethernet over MPLS
• PW control word:– status– sequencing– timing - Real-time transport
protocol (RTP)• PW label and tunnel label:
– MPLS label, L2TP session id, UDP port number
MPLS networkPECustomer
Edge (CE)Provider Edge (PE)
CE
Tunnel
Tunnel label
PW label
PW control word
Ethernet frame
Ethernet Ethernet
Andy Malis paper in IEEE Comm. Mag., Sept. 2006
Ethernet over MPLS
Eth IP ...
Eth IP ...
Eth IP ...
MPLS Eth
Eth IP ...
Eth IP ...
Example: NY to Chicago link is a point-to-point Ethernet link● LSP encoding: Ethernet● Switching type: PSC● GPID: Ethernet
PW
217
Digital wrapper
• ITU-T G. 709 provides a method to carry Ethernet frames, ATM cells, IP datagrams directly on a WDM lightpath
218
Outline
• Principles– Different types of connection-oriented
networks
• Technologies– Single network– Internetworking
Usage– Commercial networks– Research & Education Networks (REN)
219
Commercial uses
• Semi-permanent MPLS virtual circuits– Traffic engineering– Voice over IP
• QoS concerns: telephony has a 150ms one-way delay requirement (with echo cancellers)
– Business or service provider interconnect • interconnecting geographically distributed
campuses of an enterprise• interconnecting wide-area routers of an ISP
service provider
220
Traffic engineering (TE)
• Since BGP and OSPF routing protocols mainly spread reachability information, routing tables are such that some links become heavily congested while others are lightly loaded
• MPLS virtual circuits are used to alleviate this problem– e.g., NY to SF traffic could be directed to take an
MPLS virtual circuit on a lightly loaded route avoiding all paths on which more local traffic may compete
• This is an application of MPLS VCs without bandwidth allocation
221
Goals of Traffic Engineering (TE)
• Monitor network resources and control traffic to maximize performance objectives– Goal of TE is to achieve efficient network operation
with optimized resource utilization in an Autonomous System
• Goals of TE can be:– Traffic oriented
• Enhance the QoS of traffic streams• Minimization of loss and delay• Maximization of throughput
– Resource oriented• Load balancing• Minimize maximum congestion or minimize maximum
resource utilization• Output – decreased packet loss and delay, increased
throughput
222
Business or service provider interconnect
• Multiple options:– TDM circuits (traditional private line,
T1, T3, OC3, OC12, etc.)– Ethernet private line
• point-to-point (PWE3)• VPNs (called Virtual private LAN service)
– MPLS VPNs – WDM lightpaths– Dark fiber
223
First option: buy OC192 between routers
SF PoP
DC PoP
Houston PoP
SONET switch
IP router
Example: Internet2 purchased OC192s from Qwest
OC192
OC192
224
Second option: buy Ethernet point-to-point private lines
SF PoP DC PoP
IP router
Example: NLR Framenet service; also Pacificwave
10GbE
Ethernet switch
Houston PoP
10GbE
Point-to-point Ethernet private lines
225
Third option: buy multipoint Ethernet VPN
SF PoP DC PoP
IP router 10GbE
Ethernet switch
Houston PoP
10GbE
Multippoint Ethernet VLAN (VPN)
Can place all three ports in one VLAN
VPLS: Virtual Private LAN service: an Ethernet private LAN created over a wide-area network
226
Dynamic circuits/VCs(GMPLS control-plane)
• Commercial:– fast restoration
• circuit/VC setup delay significant
– rapid provisioning• similar to scheduled (book-ahead
reservations) of REN (research & education networks)
227
Industry usage of dynamic capability of GMPLS control-plane protocols
• Highly limited• OIF interoperability testing focused on routers
sending SONET setup messages to SONET switches– OIF UNI 1.0R2 and ENNI support only SONET circuits
• In 2005:– UNI 2.0 testing: to support GbE interfaces– But signaling/routing support for GbE-SONET-GbE
circuits includes proprietary INNI solutions and no ENNI solution
– GbE-SONET hybrid circuits important for REN applications
228
Compare "wire" services
• Disadvantages of Ethernet based solutions:– Spanning tree:
• convergence slow• 7-hop limit
– Flat addressing:• no summarization of MAC addresses
– VLAN tag:• only 12 bits (only 4096 LANs)• No VLAN ID swapping (unlike MPLS labels)
– contiguous requirement like lambdas in a WDM network
– Few diagnostic tools to trace problems
Andy Malis paper in IEEE Comm. Mag., Sept. 2006
229
Compare "wire" services
• WDM networks:– Low power consumption
• SONET/SDH networks:– Good error monitoring features– Higher-rate interfaces are cheaper
than on IP routers
230
Research & Education(G)MPLS networks
• NSF-funded CHEETAH• NSF-funded DRAGON• DOE's Ultra Science Network (USN)• DOE's ESnet - Science Data
Network• Next-generation Internet2• etc.
231
CHEETAH network - data plane linksGbEthernet and SONET
TN PoP
Controlcard
GbE/10GbEcard
GA PoP
Controlcard
GbE/10GbEcard
SN16000
Controlcard
OC192card
OC-192
GbE/10GbEcard
End hosts
NC PoP
SN16000 SN16000
GaTech
End hosts
End hosts
ORNL
OC192cards
NCSUOC192card
OC-192
UVaCUNYGbE
GbEGbEs
GbE
GbE
GbEGbE
GbE
Sycamore SN16000SONET switch with GbE/10GbE interfaces
232
CHEETAH network - control plane links
TN
Controlcard
GbE/10GbEcard
GA
Controlcard
GbE/10GbEcard
SN16000
Controlcard
OC192card
GbE/10GbEcard
End hosts
NC
SN16000 SN16000
GaTech
End hosts
End hosts
ORNL
OC192card
NCSUOC192card
UVa
CUNY
Internet2
ns5
ns5ns5
IPsec device
OpenswanIPsec softwareon Linux end hosts
Design goal: scalable GMPLS network
Call setupmessages
233
Networking software
• Sycamore switch comes with built-in GMPLS control-plane protocols:– RSVP-TE and OSPF-TE
• We developed CHEETAH software for Linux end hosts:– circuit-requestor
•allows users and applications to issue RSVP-TE call setup and release messages asking for dedicated circuits to remote end hosts
– CircuitTCP (CTCP) code
234
Network service
• On-demand circuit-switched service for 1Gb/s dedicated host-to-host circuits
• Call setup delay: 1.5sec– Sycamore implemented a proprietary build for hybrid
GbE-SONET-GbE circuits– No standard yet for such hybrid circuits– Sets up 7 STS-3c and VCATs them to carry a GbE
signal
• In contrast, their GMPLS standards implementation for pure-SONET circuits incurs a call setup delay of 166ms (2-hop)
235
Applications
• eScience: Terascale Supernova Initiative– File transfers– Ensight remote visualization
• general-purpose:– file transfers between CDN servers, web mirrors– web caching– video applications
236
Interesting design considerations
in the CHEETAH project• Addressing: assignment of IP addresses
to the end host and switches in the network
• Enabling OSPF-TE automatic neighbor discovery
• Security
237
Addressing
• Public vs. private? static vs. dynamic?– Shortage of IPv4 addresses– Enterprises often use private and/or dynamic IP
addresses (NAT, DHCP, etc)– We assign static public IP addresses for both data-
plane and control-plane IP addresses, why?• Data-plane
– Static: an end hosts need to be “called” by other hosts– Public: the address need to be globally unique (Private IP
addresses sufficient if goal for CHEETAH is to create a small eScience network)
• Control-plane– Static: the control-plane IP addresses are configured in
local Traffic-Engineering link configuration– Public: same global uniqueness reason for border switches
238
Ethernet control port:198.124.42.2
(routerID/switchIP:198.124.43.2)
Ethernet control port:130.207.252.2
(routerID/switchIP:130.207.253.2)
Ethernet control port:128.109.34.2
(routerID/switchIP:128.109.35.2)
InternetInternet
Control-plane links
Data-plane links
Data-plane address152.48.249.2
TN SN16000
wukong
zelda1
zelda4
Unnumbered Data-plane InterfaceID 86000001
Data-plane address152.48.249.102
Unnumbered Data-plane InterfaceID 85000002
13
0.2
07
.25
2.
20
198.124.42.20
128.109.34.20
Data-plane address198.123.28.172
Address assignment example
GA SN16000 NC SN16000
zelda1, zelda4, wukong: hosts
239
Impact of this addressing• After dedicated circuit is setup:
– far end NIC has an IP address from a different subnet• e.g.: zelda4 and wukong in the address assignment
example– Default setting of IP routing table entries will indicate
that such an address is only reachable through the default gateway
• Our solution:– Automatically update the routing table and ARP table
when circuit is set up as part of signaling code• comparable to switch fabric programming in the switch• ARP table is also automatically updated to avoid extra
round-trip propagation delay and potential broadcast storms caused by ARP
• But how does the host find the remote MAC address?
240
Using DNS TXT resource record
• Add a TXT record for the DNS entry of each CHEETAH end host in the local DNS server– Indicate that the host is in the CHEETAH network– Record the MAC address of the host’s second NIC
• During circuit setup– The two CHEETAH hosts execute DNS lookup to retrieve the
remote MAC address– At the end of a CHEETAH circuit setup, the two CHEETAH
hosts• Add a host-specific entry for the far-end second NIC’s IP address
into the IP routing table,• Add an entry into the ARP table to map the far-end second NIC’s
IP address to its MAC address.– When the CHEETAH circuit is released, these entries are
removed.
241
Enabling OSPF-TE automatic neighbor discovery
• Automatic neighbor discovery of OSPF-TE– Based on “Hello” messages– Hello messages will not be forwarded by IP
routers– If two switches are data-plane neighbors, we
need to ensure they are control-plane neighbors as well
• Solution:– IP-in-IP tunnels
• Outer datagram header carries the Ethernet control port IP addresses
• Inner datagram header carries the Router ID and the broadcast IP address as source and destination
242
Control-plane security• Importance: a malicious user could tie up circuits• Cannot use SSH, SSL, etc., because RSVP-TE and OSPF-TE
use raw IP• Our solution – IPsec tunnels
– Use external security device (Juniper NS-5XT) for switches– Use open-source software (openswan) on Linux end hosts– Establish IPsec tunnels between adjacent switches and end
hosts• Firewalls
– recall our static public IP address assignments– Use Juniper NS-5XT for switches and iptables for Linux hosts
• Limitation: host-based instead of user-based– Any user of the end host can request circuits after IPsec
tunnel is established– Future plan: use the RSVP-TE INTEGRITY object
243
CHEETAH architecture
Application
DNS client
RSVP-TE module
TCP/IP
C-TCP/IPNIC 1
NIC 2
End Host CHEETAH software
Internet
SONET circuit-switched network
CircuitGateway
CircuitGateway
Application
DNS client
RSVP-TE module
TCP/IP
C-TCP/IPNIC 1
NIC 2
End HostCHEETAH software
244
CHEETAH end-host software
• DNS lookup – to support our scalability goal
• Circuit-request setup– Message parsing
• RSVPD
– CAC for UNI link– Date-plane
configuration• Routing/ARP table
update
– Message construction• RSVPD
DNS server
RSVP-TE Daemon(RSVPD)
DNS lookupCHEETAH daemon
(CD)
socket
CD API socket
User spaceKernel space
C-TCP
C-TCP API
End host
Circuit-requestor
RSVP-TE
messages
DNS client
CAC
Route/ARP table update
RSVPD API
CHEETAH software
Integrate CD API into web servers, FTP servers, etc., so that "elephant" flows are automatically handled via a dynamically created dedicated circuit/VC
245
End-to-end signaling delay measurements
• Signaling delays incurred in setting up a circuit between zelda1 (in Atlanta, GA) and wuneng (in Raleigh, NC) across the CHEETAH network.
• Observations:– Delays for setting up SONET circuits for rates in the original SONET hierarchy
are very small (166ms) – Delays for other rates are much higher (1.6s) (vendor implementation)– Signaling message processing delay dominate the end-to-end circuit setup
delay
Circuit type End-tend circuit setup delay (s)
Processing delay for Path message at
the NC SN16000 (s)
Processing delay for Resv message at
the NC SN16000 (s)
OC-1 0.166103 0.091119 0.008689
OC-3 0.165450 0.090852 0.008650
1Gb/s EoS 1.645673 1.566932 0.008697
Round-trip signaling message propagation plus emission delay between GA SN16000 and NC SN16000: 0.025s
246
Other R&E networks
• DRAGON:– GbE and WDM (Movaz)– VLSR code: external implementation of RSVP-TE and
OSPF-TE: popular– per-domain route computation unit called NARB
• ESnet and Science data network– OSCARS: an advance-reservation system– MPLS network
• UltraScience Network– Research network for DoE labs– GbE and SONET (Ciena)– Centralized scheduler for advance-reservation calls
247
How advance-reservation systems work?
GMPLS-equipped
switch
GMPLS-equipped
switch
GMPLS-equipped
switch
Scheduler2: A new protocol (BW requested + time)
5. Third-party Path message with ERO(just before scheduled time)
4. Answer
7. Path message
Advance-reservationsscheduler
1. Maintains bandwidth availability over a time horizon for all links in the domain
3. When request for an advance reservation arrives, try different routes and find one with required bandwidth (centralized CAC)
6. Program switch fabric
GMPLS RSVP-TE signaling used for "rapid provisioning"
248
Advantages of GMPLS control-plane sacrificed
• RSVP-TE engines at switch controllers are supposed to manage bandwidth for the interfaces of the switch– distributed bandwidth management
• Route computations are supposed to be distributed to each switch– distributed routing protocols
• Both these steps are centralized in a domain scheduler because RSVP-TE and OSPF-TE do not support parameters for advance-reservation calls
249
Wide-area REN
• HOPI (Hybrid Optical Packet Infrastructure)– Uses Ethernet switches to provide
VLAN based virtual circuit service– Uses VLSRs and NARB for control-plane
• Next-generation Internet2– Will offer a dynamic circuit service– Wide-scale deployment of Ciena
SONET switches
250PROVISIONAL TOPOLOGY
Courtesy: Rick Summerhill, Internet2 web site
251PROVISIONAL TOPOLOGY
Courtesy: Rick Summerhill, Internet2 web site
252
References for REN projects
• IEEE Communication Magazine special issue, March 2006– DRAGON, USN, CHEETAH, several
other projects• CHEETAH web site:
– http://www.ece.virginia.edu/cheetah/– Papers in Opticomm 2003, IEEE JSAC
Oct. 2004, IEEE ICC 2006, IEEE Globecom 2006, etc.
253
Summary
• Principles– Different types of connection-oriented
networks
• Technologies– Single network: MPLS, SONET, OTN– Internetworking: GFP, PWE3, G.709
• Usage– Commercial networks– Research & Education Networks (REN)