katz, stoica f04 eecs 122: introduction to computer networks congestion control computer science...
Post on 19-Dec-2015
223 views
TRANSCRIPT
Katz, Stoica F04
EECS 122: Introduction to Computer Networks
Congestion Control
Computer Science Division
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Berkeley, CA 94720-1776
2Katz, Stoica F04
Today’s Lecture: 10
Network (IP)
Application
Transport
Link
Physical
2
7, 8, 9
10, 11
17, 18, 19
14, 15, 16
21, 22, 23
25
6
Katz, Stoica F04
Finishing Last Lecture
4Katz, Stoica F04
Where do IP routers belong?
Big Picture
Communication Network
SwitchedCommunication
Network
BroadcastCommunication
Network
Circuit-Switched
Communication Network
Packet-Switched
Communication Network
Datagram Network
Virtual Circuit Network
5Katz, Stoica F04
Packet (Datagram) Switching Properties
Expensive forwarding- Forwarding table size depends on number of different
destinations
- Must lookup in forwarding table for every packet
Robust- Link and router failure may be transparent for end-hosts
High bandwidth utilization- Statistical multiplexing
No service guarantees- Network allows hosts to send more packets than
available bandwidth congestion dropped packets
6Katz, Stoica F04
Virtual Circuit (VC) Switching
Packets not switched independently- Establish virtual circuit before sending data
Forwarding table entry- (input port, input VCI, output port, output VCI)
- VCI – Virtual Circuit Identifier
Each packet carries a VCI in its header Upon a packet arrival at interface i
- Input port uses i and the packet’s VCI v to find the routing entry (i, v, i’, v’)
- Replaces v with v’ in the packet header
- Forwards packet to output port i’
7Katz, Stoica F04
VC Forwarding: Example
1234
1234
1234
1234
1234
1234
5
……114
……
…3
…
in outin-VCI
11
out-VCI
…
…
5
…… 73
……
…2
…
in outin-VCI out-VCI
…
…
11
7
……14
……
…1
…
in outin-VCI out-VCI
…
…
7
1
sourcedestination
8Katz, Stoica F04
VC Forwarding (cont’d)
A signaling protocol is required to set up the state for each VC in the routing table
- A source needs to wait for one RTT (round trip time) before sending the first data packet
Can provide per-VC QoS- When we set the VC, we can also reserve bandwidth
and buffer resources along the path
9Katz, Stoica F04
VC Switching Properties
Less expensive forwarding- Forwarding table size depends on number of different
circuits
- Must lookup in forwarding table for every packet
Much higher delay for short flows- 1 RTT delay for connection setup
Less Robust- End host must spend 1 RTT to establish new
connection after link and router failure
Flexible service guarantees- Either statistical multiplexing or resource reservations
10Katz, Stoica F04
Circuit Switching
Packets not switched independently- Establish circuit before sending data
Circuit is a dedicated path from source to destination
- E.g., old style telephone switchboard, where establishing circuit means connecting wires in all the switches along path
- E.g., modern dense wave division multiplexing (DWDM) form of optical networking, where establishing circuit means reserving an optical wavelength in all switches along path
No forwarding table
11Katz, Stoica F04
Circuit Switching Properties
Cheap forwarding- No table lookup
Much higher delay for short flows- 1 RTT delay for connection setup
Less robust- End host must spend 1 RTT to establish new
connection after link and router failure
Must use resource reservations
12Katz, Stoica F04
Forwarding Comparison
pure packet switching
virtual circuit switching
circuit switching
forwarding cost
high low none
bandwidth utilization
high flexible low
resource reservations
none flexible yes
robustness high low low
13Katz, Stoica F04
Summary
Routers- Key building blocks of today a network in general, and
Internet in particular Main functionalities implemented by a router
- Packet forwarding- Buffer management- Packet scheduling- Packet classification
Forwarding techniques- Datagram (packet) switching- Virtual circuit switching- Circuit switching
Katz, Stoica F04
Starting New Lecture
Congestion Control
15Katz, Stoica F04
What We Know
We know: How to process packets in a switch How to route packets in the network How to send packets reliably
We don’t know: How fast to send
16Katz, Stoica F04
What’s at Stake?
Send too slow: link is not fully utilized- wastes time
Send too fast: link is fully utilized but....- queue builds up in router buffer (delay)
- overflow buffers in routers
- overflow buffers in receiving host (ignore)
Why are buffer overflows a problem?- packet drops (mine and others)
- Interesting history....(Van Jacobson rides to the rescue)
17Katz, Stoica F04
Abstract View
We ignore internal structure of router and model it as having a single queue for a particular input-output pair
Sending Host Buffer in Router Receiving Host
A B
18Katz, Stoica F04
Three Congestion Control Problems
Adjusting to bottleneck bandwidth
Adjusting to variations in bandwidth
Sharing bandwidth between flows
19Katz, Stoica F04
Single Flow, Fixed Bandwidth
Adjust rate to match bottleneck bandwidth- without any a priori knowledge
- could be gigabit link, could be a modem
A B100 Mbps
20Katz, Stoica F04
Single Flow, Varying Bandwidth
Adjust rate to match instantaneous bandwidth- assuming you have rough idea of bandwidth
A BBW(t)
21Katz, Stoica F04
Multiple Flows
Two Issues: Adjust total sending rate to match bandwidth Allocation of bandwidth between flows
A2 B2100 Mbps
A1
A3 B3
B1
22Katz, Stoica F04
Reality
Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics
23Katz, Stoica F04
General Approaches
Send without care- many packet drops- not as stupid as it seems
Reservations- pre-arrange bandwidth allocations- requires negotiation before sending packets- low utilization
Pricing- don’t drop packets for the high-bidders- requires payment model
24Katz, Stoica F04
General Approaches (cont’d)
Dynamic Adjustment- probe network to test level of congestion
- speed up when no congestion
- slow down when congestion
- suboptimal, messy dynamics, simple to implement
All three techniques have their place- but for generic Internet usage, dynamic adjustment is
the most appropriate
- due to pricing structure, traffic characteristics, and good citizenship
25Katz, Stoica F04
TCP Congestion Control
TCP connection has window- controls number of unacknowledged packets
Sending rate: ~Window/RTT
Vary window size to control sending rate
26Katz, Stoica F04
Congestion Window (cwnd)
Limits how much data can be in transit Implemented as # of bytes Described as # packets in this lecture
EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked)
MaxWindow = min(cwnd, AdvertisedWindow)
LastByteAckedLastByteSent
sequence number increases
MaxWindow
EffectiveWindow
27Katz, Stoica F04
Two Basic Components
Detecting congestion
Rate adjustment algorithm- depends on congestion or not
- three subproblems within adjustment problem
• finding fixed bandwidth
• adjusting to bandwidth variations
• sharing bandwidth
28Katz, Stoica F04
Detecting Congestion
Packet dropping is best sign of congestion- delay-based methods are hard and risky
How do you detect packet drops? ACKs- TCP uses ACKs to signal receipt of data- ACK denotes last contiguous byte received
• actually, ACKs indicate next segment expected
Two signs of packet drops- No ACK after certain time interval: time-out- Several duplicate ACKs (ignore for now)
29Katz, Stoica F04
Rate Adjustment
Basic structure:- Upon receipt of ACK (of new data): increase rate
- Upon detection of loss: decrease rate
But what increase/decrease functions should we use?
- Depends on what problem we are solving
30Katz, Stoica F04
Problem #1: Single Flow, Fixed BW
Want to get a first-order estimate of the available bandwidth
- Assume bandwidth is fixed- Ignore presence of other flows
Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”)
Adjustment: - cwnd initially set to 1- cwnd++ upon receipt of ACK
31Katz, Stoica F04
Slow-Start
cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent
- Each ACK releases two packets
- Slow-start is called “slow” because of starting pointsegment 1cwnd = 1
cwnd = 2 segment 2segment 3
cwnd = 4 segment 4
segment 5segment 6segment 7
cwnd = 8
cwnd = 3
32Katz, Stoica F04
Problems with Slow-Start
Slow-start can result in many losses- roughly the size of cwnd ~ BW*RTT
Example:- at some point, cwnd is enough to fill “pipe”
- after another RTT, cwnd is double its previous value
- all the excess packets are dropped!
Therefore, need a more gentle adjustment algorithm once have rough estimate of bandwidth
33Katz, Stoica F04
Problem #2: Single Flow, Varying BW
Want to be able to track available bandwidth, oscillating around its current value
Possible variations: (in terms of RTTs)- multiplicative increase or decrease: cwnd a*cwnd- additive increase or decrease: cwnd cwnd + b
Four alternatives:- AIAD: gentle increase, gentle decrease- AIMD: gentle increase, drastic decrease- MIAD: drastic increase, gentle decrease (too many losses)- MIMD: drastic increase and decrease
34Katz, Stoica F04
Problem #3: Multiple Flows
Want steady state to be “fair”
Many notions of fairness, but here all we require is that two identical flows end up with the same bandwidth
This eliminates MIMD and AIAD
AIMD is the only remaining solution!
35Katz, Stoica F04
Buffer and Window Dynamics
No congestion x increases by one packet/RTT every RTT Congestion decrease x by factor 2
A BC = 50 pkts/RTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if > 20
Rate (pkts/RTT)
x
36Katz, Stoica F04
AIMD Sharing Dynamics
A Bx
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packet/RTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
y
37Katz, Stoica F04
AIAD Sharing Dynamics
A Bx
D E No congestion x increases by one packet/RTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
y
38Katz, Stoica F04
AIMD
C
x
y
A Bx
C
D Ey
Limit rates:x = y
39Katz, Stoica F04
AIAD
C
x
y
A Bx
C
D Ey
Limit rates:x and y depend
on initial values
40Katz, Stoica F04
Implementing AIMD
After each ACK- increment cwnd by 1/cwnd (cwnd += 1/cwnd)
- as a result, cwnd is increased by one only if all segments in a cwnd have been acknowledged
But need to decide when to leave slow-start and enter AIMD use ssthresh variable
41Katz, Stoica F04
Slow Start/AIMD Pseudocode
Initially:cwnd = 1;ssthresh = infinite;
New ack received:if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1;else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd;
Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = 1;
42Katz, Stoica F04
The big picture (with timeouts)
Time
cwnd
Timeout
SlowStart
AIMD
ssthresh
Timeout
SlowStart
SlowStart
AIMD
43Katz, Stoica F04
Congestion Detection Revisited
Wait for Retransmission Time Out (RTO)- RTO kills throughput
In BSD TCP implementations, RTO is usually more than 500ms
- the granularity of RTT estimate is 500 ms
- retransmission timeout is RTT + 4 * mean_deviation
Solution: Don’t wait for RTO to expire
44Katz, Stoica F04
Fast Retransmits
Resend a segment after 3 duplicate ACKs
- a duplicate ACK means that an out-of sequence segment was received
Notes:- ACKs are for next
expected packet - packet reordering can
cause duplicate ACKs- window may be too small
to get enough duplicate ACKs
ACK 2
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 4cwnd = 4 segment 4
segment 5segment 6segment 7
ACK 3
3 duplicateACKs
ACK 4
ACK 4
ACK 4
45Katz, Stoica F04
Fast Recovery: After a Fast Retransmit
ssthresh = cwnd / 2 cwnd = ssthresh
- instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease)
for each dup ack arrival- dupack++- MaxWindow = min(cwnd + dupack, AdvWin) - indicates packet left network, so we may be able to send more
receive ack for new data (beyond initial dup ack)- dupack = 0- exit fast recovery
But when RTO expires still do cwnd = 1
46Katz, Stoica F04
Fast Retransmit and Fast Recovery
Retransmit after 3 duplicated acks- Prevent expensive timeouts
Reduce slow starts At steady state, cwnd oscillates around the
optimal window size
Time
cwnd
Slow Start
AI/MD
Fast retransmit
47Katz, Stoica F04
TCP Congestion Control Summary
Measure available bandwidth- slow start: fast, hard on network- AIMD: slow, gentle on network
Detecting congestion- timeout based on RTT
• robust, causes low throughput- Fast Retransmit: avoids timeouts when few packets lost
• can be fooled, maintains high throughput
Recovering from loss- Fast recovery: don’t set cwnd=1 with fast retransmits
48Katz, Stoica F04
Issues to Think About
What about short flows? (setting initial cwnd)- most flows are short- most bytes are in long flows
How does this work over wireless links?- packet reordering fools fast retransmit- loss not always congestion related
High speeds?- to reach 10gbps, packet losses occur every 90 minutes!
Why are losses bad?- Tornado codes: can reconstruct data proportional to packets that get
through. Why not send at maximal rate?
Fairness: how do flows with different RTTs share link?
49Katz, Stoica F04
Bonus Question
Why is TCP like Blanche Dubois?
Because it “relies on the kindness of strangers...”
What happens if not everyone cooperates?