1 tcp congestion control. 2 tcp segment structure source port # dest port # 32 bits application data...
TRANSCRIPT
2
TCP Segment Structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)RST, SYN, FIN:connection
management(reset, setup
teardowncommands)
# bytes rcvr willingto accept
ACK: ACK #valid
countingby bytes of data(not segments!)
Also in UDP
URG: urgent data (generally not used)
PSH: push data now(generally not used)
3
TCP Flow Control
receiver: explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindow field in
TCP segmentsender: keeps the amount
of transmitted, unACKed data less than the most recently received RcvWindow
sender won’t overrun
receiver’s buffers bytransmitting too
much, too fast
flow control
receiver buffering
RcvBuffer = size of TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
Questions:1.What is the maximum size of RcvBuffer?2. Can sender estimate the size of RcvBuffer?3. Can receiver change its RcvBuffer size in the middle of a session?4. Can Sender know the change?
5
Principles of Congestion Control
Big picture: How to determine a flow’s sending rate?
Congestion: informally: “too many sources sending too much data too
fast for the network to handle” different from flow control! manifestations:
lost packets (buffer overflow at routers) wasted bandwidth long delays (queueing in router buffers)
a top-10 problem!
6
History
TCP congestion control in mid-1980s fixed window size w timeout value = 2 RTT
Congestion collapse in the mid-1980s UCB LBL throughput dropped by 1000X!
7
Some General Questions
How can congestion happen? What is congestion control? Why is congestion control difficult? Will congestion disappear in the future due to
technology advances (e.g. faster links, routers)?
How does TCP provide congestion control?
8
flow 2 (5 Mbps)
flow 1
router 1 router 2
10 Mbps5 Mbps20 Mbps
Cause/Cost of Congestion: Scenario 1
Flow 2 has a fixed sending rate of 5 MbpsWe vary the sending rate of flow 1 from 0 to 20 MbpsAssume
No retransmission The link from router 1 to router 2 has infinite buffer Throughput: packets go through
10 Mbps
20 Mbps
sending rate by flow 1 (Mbps)
Total throughput of flow 1 & 2 (Mbps)
5
10
sending rate by flow 1 (Mbps)
Delay at link 1
5 5
maximum achievable throughput
large delays when congested
0 0
delay due torandomness
9
flow 2 (5 Mbps)
flow 1
router 1
10 Mbps5 Mbps20 Mbps
Cause/Cost of Congestion: Scenario 2
Assume No retransmission The link from router 1 to router 2 has finite buffer Throughput: packets go through
5 Mbps
5 Mbps
sending rate by flow 1 (Mbps)
Total throughput of flow 1 & 2 (Mbps)
5
10
5
when packet dropped at the link from router 2 to router 6, the upstream transmission from router 1 to router 2 used for that packet was wasted!
0
router 3
router 4
router 2
router 5
router 6
What if retransmission?
10
Summary: The Cost of Congestion
Cost High delay Packet loss Wasted upstream
bandwidth when a pkt is discarded at downstream
Wasted bandwidth due to retransmission (a pkt goes through a link multiple times)
Load
Load
De
lay
Th
rou
ghp
ut knee cliff
congestioncollapse
packetloss
11
Approaches towards congestion control
End-end congestion control: no explicit feedback from
network congestion inferred from
end-system observed loss, delay
approach taken by TCP
Network-assisted congestion control:
routers provide feedback to end systems single bit indicating
congestion (SNA, DECbit, TCP/IP ECN, ATM)
explicit rate sender should send at
Two broad approaches towards congestion control:
12
Open-loop vs. Closed-loop
Open-loop: A flow does not adjust its
sending rate dynamically according to the status of the network
Need reservation to avoid congestion collapse
Closed-loop: A flow adjusts its rate
dynamically according to the status of the network
13
End-to-end vs. Hop-by-hop
End-to-end congestion control:
A flow determines its rate
Hop-by-hop: Routers on the path
implement flow control between each other e.g. ATM credit-based
Scheduling for flows at a link
14
Implicit vs. Explicit
Implicit: congestion inferred by end
systems through observed loss, delay
Explicit: routers provide feedback to
end systems explicit rate sender
should send at single bit indicating
congestion (SNA, DECbit, TCP ECN, ATM)
15
Window-based: Congestion control by
controlling the window size of a transport scheme, e.g. set window size to 64KBytes
Example: TCP
Rate-based: Congestion control by
explicitly controlling the sending rate of a flow, e.g. set sending rate to 128Kbps
Example: ATM
Rate-based vs. Window-based
18
TCP Congestion Control
Closed-loop, end-to-end, implicit, window-based congestion control
Transmission rate limited by congestion window size, cwnd, over segments:
w segments, each with MSS bytes sent in one RTT:
throughput w * MSS
RTT Bytes/sec
cwnd
19
TCP Congestion Control: Basic Question
Ideally, we want to set the window size (approximately) to the product of available bandwidth (for this flow) and round-trip delay
However, We don’t know these parameters at the beginning of
a flow Further, the available bandwidth and round-trip are
changing, because of competing flows
20
TCP Congestion Control: Basic Structure
Two “phases” SlowStart congestion avoidance (AIMD)
Important variables: cwnd: congestion window size ssthresh: threshold between the slow-start phase and the
congestion avoidance phase Many versions of TCP
TCP/Tahoe: this is a less optimized version TCP/Reno: this is what we are talking about today; most OSs
today implement TCP/Reno TCP/Vegas: currently not used
21
TCP Congestion Control Implementation
Initially:cwnd = 1;ssthresh = infinite (64K);
For each newly ACKed segment:if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1;else /* congestion avoidance; cwnd increases by 1 per RTT */ cwnd += 1/cwnd;
Triple-duplicate ACKs: /* multiplicative decrease */
cwnd = ssthresh = cwnd/2;Timeout:
ssthresh = cwnd/2;cwnd = 1;
22
TCP AIMD
AIMD [Jacobson 1988]:Additive Increase :
In every RTTW = W + 1*MSS
Multiplicative Decrease : Upon a congestion
event W = W/2
SenderReceiver
TCP
Acknowledgment Packets
0Time
Congestion Window Size
AI
MD
1 RTT
Data Packets
Network
23
TCP Slow Start
When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes
& RTT = 200 msec initial rate = 20 kbps
available bandwidth may be >> MSS/RTT desirable to quickly ramp
up to respectable rate
When connection begins, increase rate exponentially fast until first loss event double CongWin every RTT done by incrementing
CongWin for every ACK received
Why call it slowstart: initial rate is slow but ramps up exponentially fast
24
TCP Slow-start
ACK for segment 1
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK for segments 2 + 3
cwnd = 4 segment 4segment 5segment 6segment 7
cwnd = 6
Initially:cwnd = 1;ssthresh = infinite (64K);
For each newly ACKed segment:if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1;
Timeout or Triple Duplicate ACKs:/*slowstart stops*/
cwnd = 8
25
Fast Retransmit
After 3 dup ACKs: CongWin is cut in half window then grows linearly
But after timeout event: CongWin instead set to 1 MSS; window then grows
exponentially to a threshold, then grows
linearly
• 3 dup ACKs indicates network capable of delivering some segments• timeout before 3 dup ACKs is “more alarming”
Philosophy:
26
Fast Recovery
Q: When should the exponential increase switch to linear?
A: When CongWin gets to 1/2 of its value before timeout.
Implementation: Variable Threshold At loss event, Threshold
is set to 1/2 of CongWin just before loss event
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission roundco
nges
tion
win
dow
siz
e (s
egm
ents
)
Series1 Series2
27
TCP/Reno: Big Picture
Time
cwnd
slowstart
congestionavoidance
TD
TD: Triple duplicate acknowledgementsTO: Timeout
TOssthresh
ssthresh ssthreshssthresh
congestionavoidance
TD
congestionavoidance
slow start
congestionavoidance
28
Summary: TCP Congestion Control
When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.
When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.
When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.