congestion - scs.stanford.edu
TRANSCRIPT
![Page 1: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/1.jpg)
Congestion
Destination1.5-Mbps T1 link
Router
Source2
Source1
100-Mbps FDDI
10-Mbps Ethernet
• Can’t sustain input rate >> output rate
• Issues:- Avoid congestion
- Control congestion
- Prioritize who gets limited resources
![Page 2: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/2.jpg)
Taxonomy of approaches
• Router-centric vs. host-centric- hosts at the edges of the network (transport protocol)
- routers inside the network (queuing discipline)
• Reservation based vs. feedback based- pre-allocate resources so at to avoid congestion
- control congestion if (and when) is occurs
• Window based vs. rate based
• Best-effort (today) vs. multiple QoS (Thursday)
![Page 3: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/3.jpg)
Router design issues
• Scheduling discipline- Which of multiple packets should you send next?
- May want to achieve some notion of fairness
- May want some packets to have priority
• Drop policy- When should you discard a packet?
- Which packet to discard?
- Some packets more important (perhaps BGP)
- Some packets useless w/o others (cells in AAL5 CS-PDU)
- Need to balance throughput & delay
![Page 4: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/4.jpg)
Example: FIFO tail dropArrivingpacket
Next freebuffer
Free buffers Queued packets
Next totransmit
(a)
Arrivingpacket
Next totransmit
(b) Drop
• Differentiates packets only by when they arrive
• Might not provide useful feedback for sending hosts
![Page 5: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/5.jpg)
What to optimize for?
• Fairness (in two slides)
• High throughput – queue should never be empty
• Low delay – so want short queues
• Crude combination: power = Throughput/Delay- Want to convince hosts to offer optimal load
Optimalload
Load
Thro
ughp
ut/
del
ay
![Page 6: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/6.jpg)
Connectionless flows
Router
Source
2
Source
1
Source
3
Router
Router
Destination
2
Destination
1
• Even in Internet, routers can have a notion of flows- E.g., base on IP addresses & TCP ports (or hash of those)
- Soft state—doesn’t have to be correct
- But if often correct, can use to form router policies
![Page 7: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/7.jpg)
Fairness
• What is fair in this situation?- Each flow gets 1/2 link b/w? Long flow gets less?
• Usually fair means equal- For flow bandwidths (x1, . . . , xn), fairness index:
f(x1, . . . , xn) =(∑
n
i=1xi)
2
n∑
n
i=1x2
i
- If all xis are equal, fairness is one
• So what policy should routers follow?- First, we have to understand what TCP is doing
![Page 8: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/8.jpg)
TCP Congestion Control
• Idea- Assumes best-effort network
- Each source determines network capacity for itself
- Uses implicit feedback (dalay, drops)
- ACKs pace transmission (self-clocking)
• Challenge- Determining the available capacity in the first place
- Adjusting to changes in the available capacity
![Page 9: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/9.jpg)
Detecting congestion
• Question: how does the source determine whetheror not the network is congested?
• Answer: a timeout occurs- Timeout signals that a packet was lost
- Packets are seldom lost due to transmission error
- Lost packet implies congestion
![Page 10: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/10.jpg)
Dealing with congestion
• TCP keeps congestion & flow control windows- Max packets in flight is lesser of two
• After a packet loss, must reduce cong. window- This will control congestion situation
- But how much to reduce?
• Idea: conservation of packets at equilibrium- Want to keep roughly same number of packets in network
- By analogy with water in fixed-size pipe
- Put new packet into network when one exits
![Page 11: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/11.jpg)
How much to reduce window?
• Let’s build a crude model of network- Let Li be load of network (# pkts in contains) at time i
- If network uncongested, roughly constant Li = N
• Now what happens under congestion?- Some fraction γ of packets can’t exit network
- So now Li = N + γ · Li−1, or Li ≈ gi· L0
- Congestion increases exponentially (w. infinite buffers)
• Requires multiplicative decrease of window size- TCP choses to cut window in half
![Page 12: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/12.jpg)
How to use extra capacity?
• Must adjust as extra capacity becomes available- Unlike drops for congestion, no explicit signal
- Instead, try to send slightly faster, see if it works
- So need to increase window when no losses – how much?
• Multiplicative increase- But easier to saturate net than to recover (rush-hour effect)
- Multiplicative so fast, will inevitably lead to saturation
• Additive increase won’t saturate net- So Additive Increase, Multiplicative Decrease, AIMD
![Page 13: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/13.jpg)
Additive IncreaseSource Destination
…
![Page 14: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/14.jpg)
Implementation
• In practice, sending MSS-sized frames- Let window size in bytes be w, should be multiple of MSS
• Increase:- After w ·MSS bytes ACKed, could set w ← w + MSS
- Smoother to increment window on each ACK received:w ← w + MSS ·MSS/w
• Decrease:- After a packet loss, w ← w/2
- But don’t want w < MSS
- So react differently to multiple consecutive losses
- Back-off exponentially (pause with no packets in flight)
![Page 15: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/15.jpg)
AIMD trace
• Window trace produces sawtooth pattern:
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
30
40
50
10
10.0
![Page 16: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/16.jpg)
Slow start
• Question: Where to set w initially?- Should start at 1 MSS (to avoid overloading network)
- But additive ramp-up too slow on fast net
• Start by doubling window each RTT- Then at most will dump one extra window into network
• Slow start? This sounds like fast start?- In contrast to what happened before Jacobson/Karels work
- Sender would dump an entire flow control window into net
• Slow start used in multiple situations- Connection start time & after timeout
![Page 17: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/17.jpg)
Slow start pictureSource Destination
…
![Page 18: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/18.jpg)
Slow start implementation• We are doubling w after each RTT
- But receiving w packets each RTT
- So can set w ← w + MSS on every ack received
• Now implementation has to keep track of threelimits
- AvailableWindow – for flow control
- CongestionThreshold – old congestion window
- CongestionWindow – smaller than threshold during slowstart
• Slow start only up to CongestionThreshold- Remember last value
- When reached, go back to additive increase
![Page 19: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/19.jpg)
Fast retransmit & fast recovery
• Problem: Coarse-grain TCP timeouts- Have to be conservative about RTT
- Net will sit idle while waiting for a timeout
- Worse, TCP intentionally keeps bumping head against limit
• Solution: Fast retransmit- Use 3 duplicate ACKs to trigger retransmission
- If more than one packet was lost, still need timeout
- Else, halve w, but otherwise keep sending
- No need to set w ←MSS and use slow start
![Page 20: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/20.jpg)
Fast retransmit picture
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Retransmitpacket 3
ACK 1
ACK 2
ACK 2
ACK 2
ACK 6
ACK 2
Sender Receiver
![Page 21: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/21.jpg)
Before fast retransmit
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
30
40
50
10
![Page 22: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/22.jpg)
With fast retransmit
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
KB
Time (seconds)
70
30
40
50
10
![Page 23: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/23.jpg)
Congestion Avoidance
• TCP’s strategy- Control congestion once it happens
- Repeatedly increase load in an effort to find the point atwhich congestion occurs, and then back off
• Alternative strategy- Predict when congestion is about to happen
- Reduce rate before packets start being discarded
- Call this congestion avoidance, instead of congestioncontrol
• Two possibilities- Host-centric: TCP Vegas
- Router-centric: DECbit and RED Gateways
![Page 24: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/24.jpg)
TCP Vegas
Idea: source watches for some sign that router’s queue is building upand congestion will happen—E.g., RTT grows or sending rate flattens.
60
20
0.5 1.0 1.5 4.0 4.5 6.5 8.0
KB
Time (seconds)
Time (seconds)
70
304050
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
900
300
100
0.5 1.0 1.5 4.0 4.5 6.5 8.0
Sen
din
g K
Bps
1100
500
700
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0
Queu
e si
ze in r
oute
r
5
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
![Page 25: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/25.jpg)
TCP Vegas picture
![Page 26: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/26.jpg)
70605040302010
KB
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
KB
ps
240
200
160
120
80
40
Time (seconds)
![Page 27: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/27.jpg)
Fair Queuing (FQ)
• Explicitly segregates traffic based on flows
• Ensures no flow consumes more than its share
• Variation: weighted fair queuing (WFQ)
Flow 1
Flow 2
Flow 3
Flow 4
Round-robin
service
![Page 28: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/28.jpg)
FQ Algorithm
• Suppose clock ticks each time a bit is transmitted
• Let Pi denote the length of packet i
• Let Si denote the time when start to transmit packet i
• Let Fi denote the time when finish transmitting packet i
• Fi = Si + Pi
• When does router start transmitting packet i?- If arrived before router finished packet i− 1 from this flow, then
immediately after last bit of i− 1 (Fi−1)
- If no current packets for this flow, then start transmitting whenarrives (call this Ai)
• Thus: Fi = max(Fi−1, Ai) + Pi
![Page 29: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/29.jpg)
FQ Algorithm (cont)
• For multiple flows- Calculate Fi for each packet that arrives on each flow
- Treat all Fis as timestamps
- Next packet to transmit is one with lowest timestamp
• Not perfect: can’t preempt current packet
• Example:
Flow 1 Flow 2
(a) (b)
Output Output
F = 8 F = 10
F = 5
F = 10
F = 2
Flow 1(arriving)
Flow 2(transmitting)
![Page 30: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/30.jpg)
Random Early Detection (RED)
• Notification is implicit- just drop the packet (TCP will timeout)
- could make explicit by marking the packet
• Early random drop- rather than wait for queue to become full, drop each
arriving packet with some drop probability whenever thequeue length exceeds some drop level
![Page 31: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/31.jpg)
RED Details
• Compute average queue lengthAvgLen = (1−Weight) ·AvgLen+Weight ·SampleLen
0 < Weight < 1 (usually 0.002)SampleLen is queue length each time a packetarrives
MaxThreshold MinThreshold
AvgLen
![Page 32: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/32.jpg)
AvgLen
Queue length
Instantaneous
Average
Time
• Smooths out AvgLen over time- Don’t want to react to instantaneous fluctuations
![Page 33: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/33.jpg)
RED Details (cont)
• Two queue length thresholds:
if AvgLen <= MinThreshold then
enqueue the packet
if MinThreshold < AvgLen < MaxThreshold then
calculate probability P
drop arriving packet with probability P
if ManThreshold <= AvgLen then
drop arriving packet
![Page 34: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/34.jpg)
RED Details (cont)
• Computing probability P- TempP = MaxP · (AvgLen−MinThreshold)/(MaxThreshold−
MinThreshold)
- P = TempP/(1− count · TempP)
• Drop Probability Curve:P(drop)
1.0
MaxP
MinThresh MaxThresh
AvgLen
![Page 35: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/35.jpg)
Tuning RED
- Probability of dropping a particular flow’s packet(s) is roughlyproportional to the share of the bandwidth that flow iscurrently getting
- MaxP is typically set to 0.02, meaning that when the averagequeue size is halfway between the two thresholds, the gatewaydrops roughly one out of 50 packets.
- If traffic is bursty, then MinThreshold should be sufficientlylarge to allow link utilization to be maintained at an acceptablyhigh level
- Difference between two thresholds should be larger than thetypical increase in the calculated average queue length in oneRTT; setting MaxThreshold to twice MinThreshold isreasonable for traffic on today’s Internet
![Page 36: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/36.jpg)
FPQ
• Problem: Tuning RED can be slightly tricky
• Observations:- TCP performs badly with window size under 4 packets:
Need 4 packets for 3 duplicate ACKs and fast retransmit
- Can supply feedback through delay as well as through drops
• Solution: Make buffer size proportional to #flows- Few flows =⇒ low delay; Many flows =⇒ low loss rate
- Router automatically adjusts, far less tricky tuning required
- Window size is a function of loss rate, keep min size
- Transmit rate = Window size / RTT, RTT ∼ Qlen
• Clever algorithm estimates number of flows- Hash flow info, set bits, decay
- Requires reasonable amount of storage
![Page 37: Congestion - scs.stanford.edu](https://reader034.vdocuments.site/reader034/viewer/2022042311/625bdafb5634193229074704/html5/thumbnails/37.jpg)
XCP
• New proposed IP protocol: XCP- Not compatible w. TCP, requires router support
- Idea: Have router tell us exactly what we want to know!
• Packets contain: cwnd, RTT, feedback field
• Router tells you whether to increase or decrease rate- Give explicit rates for increase/decrease amounts
- Later routers don’t override bottleneck router
- Feedback returned to sender in ACKs