1 ch. 7 : internet transport protocols. 3-2 tcp reliable data transfer r tcp creates reliable...
Post on 19-Dec-2015
217 views
TRANSCRIPT
1
Ch. 7 : Internet Transport Protocols
3-2
TCP reliable data transfer
TCP creates reliable service on top of IP’s unreliable service
pipelined segments cumulative acks single retransmission
timer receiver accepts out
of order segments but does not acknowledge them
Retransmissions are triggered by timeout events
Initially consider simplified TCP sender: ignore flow control,
congestion control
3-3
TCP sender events:data rcvd from app: create segment with
seq # seq # is byte-stream
number of first data byte in segment
start timer if not already running (think of timer as for oldest unACKed segment)
expiration interval: TimeOutInterval
timeout: retransmit segment
that caused timeout restart timer ACK rcvd: if acknowledges
previously unACKed segments update what is known
to be ACKed start timer if there are
outstanding segments
Transport
Layer
3-4
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) { switch(event)
event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */
Comment:• SendBase-1: last cumulatively ACKed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is ACKed
3-5
TCP actions on receiver events:
application takes data: free the room in
buffer give the freed cells
new numbers circular numbering
WIN increases by the number of bytes taken
data rcvd from IP: if Checksum fails, ignore
segment If checksum OK, then :
if data came in order: update AN+WIN AN grows by the number of
new in-order bytes WIN decreases by same #
if data out of order: Put in buffer, but don’t count it
for AN/ WIN
3-6
TCP: retransmission scenarios
stop timer
stop timer
starttimer for
SN 100
Host A
AN=100
timeA. normal scenario
Host B
AN=120
SN=100 , 20 bytes data
SN=92, 8 bytes data
starttimer for
SN 92
NO timer
starttimer for
new SN 92
AN=100
Host ASN=92, 8 bytes data
Xloss
B. lost ACK + retransmission
Host B
SN=92, 8 bytes data
AN=100
time
starttimer for
SN 92
TIMEOUT
NO timer
stop timer
timer setting
actual timer run
3-7" " ב ס א תשע אפקה Transport Layer 3-7
TCP retransmission scenarios (more)
see also slide 47
AN=100
Host ASN=92, 8 bytes data
Xloss
C. lost ACK, NO retransmission
Host B
SN=100, 20 bytes data
AN=120
time
starttimer for
SN 92
stop timer
NO timer
Host A
timeD. premature timeout
Host BSN=92, 8 bytes data
AN=120
starttimer for
SN 92
TIMEOUT
NO timer
star fort 92stop
start for 100
stop
SN=100, 20 bytes data
AN=100
AN=120
SN=92, 8 bytes data
redundant ACK
Transport Layer 3-8
TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmentwith higher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byteThis Ack carries no data
Immediate send ACK, provided thatsegment starts at lower end of gap
Transport Layer 3-9
Fast Retransmit
time-out period often relatively long: long delay before
resending lost packet
detect lost segments via duplicate ACKs. sender often sends
many segments back-to-back
if segment is lost, there will likely be many duplicate ACKs for that segment
If sender receives 3 ACKs for same data, it assumes that segment after ACKed data was lost: fast retransmit: resend
segment before timer expires
Transport Layer 3-10
Host A
tim
eout
Host B
time
X
resend seq X2
seq # x1seq # x2seq # x3seq # x4seq # x5
ACK x1
ACK x1ACK x1ACK x1
tripleduplicate
ACKs
Transport Layer 3-11
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
12
TCP: Flow Control
3-13
TCP Flow Control for A’s data
receive side of TCP connection at B has a receive buffer:
flow control matches the send rate of A to the receiving application’s drain rate at B
Receive buffer size set by OS at connection init
WIN = window size = number bytes A may send starting at AN
application process at B may be slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
node B : Receive process
Receive Buffer
data taken by
application
TCP datain buffer
spare room
WIN
data from IP
(sent by TCP at A)
AN
אפקה תשע"א 3-14ס"ב
TCP Flow control: how it works
Formulas: AN = first byte not received yet
sent to A in TCP header AckedRange =
= AN – FirstByteNotReadByAppl = = # bytes rcvd in sequence & not taken
WIN = RcvBuffer – AckedRange= SpareRoom
AN and WIN sent to A in TCP header
Data rcvd out of sequence is considered part of ‘spare room’ range
Procedure: Rcvr advertises “spare
room” by including value of WIN in his segments
Sender A is allowed to send at most WIN bytes in the range starting with AN guarantees that receive
buffer doesn’t overflow
node B : Receive process
ACKed datain buffer
Rcv Buffer
data from IPdata taken by
application
WIN
(sent by TCP at A)s p a r e r o o m
non-ACKed data in buffer(arrived out of order)
ignored
AN
אפקה תשע"א 3-15ס"ב
1 – דוגמה TCPבקרת זרימה של
אפקה תשע"א 3-16ס"ב
2 – דוגמה TCPבקרת זרימה של
17
TCP: setting timeouts
18
TCP Round Trip Time and TimeoutQ: how to set TCP
timeout value? longer than RTT
note: RTT will vary too short: premature
timeout unnecessary
retransmissions too long: slow
reaction to segment loss
Q: how to estimate RTT? SampleRTT: measured time
from segment transmission until ACK receipt ignore retransmissions,
cumulatively ACKed segments
SampleRTT will vary, want estimated RTT “smoother” use several recent
measurements, not just current SampleRTT
19
High-level Idea
Set timeout = average + safe margin
20
Estimating Round Trip Time
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially
fast typical value: = 0.125
SampleRTT: measured time from segment transmission until ACK receipt SampleRTT will vary, want a “smoother” estimated RTT
use several recent measurements, not just current SampleRTT
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
21
Setting TimeoutProblem: using the average of SampleRTT will generate
many timeouts due to network variations
Solution: EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
RTT
freq.
22
An Example TCP Session
24
TCP: Congestion Control
25
TCP Congestion Control Closed-loop, end-to-end, window-based
congestion control Designed by Van Jacobson in late 1980s, based
on the AIMD alg. of Dah-Ming Chu and Raj Jain Works well so far: the bandwidth of the Internet
has increased by more than 200,000 times Many versions
TCP/Tahoe: this is a less optimized version TCP/Reno: many OSs today implement Reno
type congestion control TCP/Vegas: not currently used
For more details: see TCP/IP illustrated; or readhttp://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation
26
TCP & AIMD: congestion
Dynamic window size [Van Jacobson] Initialization: MI
• Slow start Steady state: AIMD
• Congestion Avoidance
Congestion = timeout TCP Tahoe
Congestion = timeout || 3 duplicate ACK TCP Reno & TCP new Reno
Congestion = higher latency TCP Vegas
27
Visualization of the Two Phases
threshold
Congw
ing
Slow start
Congestion avoidance
28
TCP Slowstart: MI
exponential increase (per RTT) in window size (not so slow!)
In case of timeout: Threshold=CongWin/2
initialize: Congwin = 1for (each segment ACKed) Congwin++until (congestion event OR CongWin > threshold)
Slowstart algorithmHost A
one segment
RTT
Host B
time
two segments
four segments
29
TCP Tahoe Congestion Avoidance
/* slowstart is over */ /* Congwin > threshold */Until (timeout) { /* loss event */ every ACK: Congwin += 1/Congwin }threshold = Congwin/2Congwin = 1perform slowstart
Congestion avoidance
TCP Taheo
30
TCP Reno Fast retransmit:
Try to avoid waiting for timeout
Fast recovery: Try to avoid slowstart. used only on triple duplicate even Single packet drop: not too bad
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60
Time
Co
ng
esti
on
Win
do
w
threshold
congestionwindowtimeouts
slow start period
additive increase
fast retransmission
31
TCP Reno cwnd Trace
Transport Layer 3-32
TCP congestion control: bandwidth probing
“probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since available
bandwidth is changing, depending on other connections in network)
ACKs being received, so increase rate
X
X
XX
X loss, so decrease rate
send
ing
rate
time
Q: how fast to increase/decrease? details to follow
TCP’s“sawtooth”behavior
Transport Layer 3-33
TCP Congestion Control: details
sender limits rate by limiting number of unACKed bytes “in pipeline”:
cwnd: differs from rwnd (how, why?) sender limited by min(cwnd,rwnd)
roughly,
cwnd is dynamic, function of perceived network congestion
rate = cwnd
RTT bytes/sec
LastByteSent-LastByteAcked cwnd
cwndbytes
RTT
ACK(s)
Transport Layer 3-34
TCP Congestion Control: more details
segment loss event: reducing cwnd
timeout: no response from receiver cut cwnd to 1
3 duplicate ACKs: at least some segments getting through (recall fast retransmit) cut cwnd in half, less
aggressively than on timeout
ACK received: increase cwnd slowstart phase:
start low (cwnd=MSS) increase cwnd exponentially
fast (despite name) used: at connection start, or
following timeout
congestion avoidance: increase cwnd linearly
Transport Layer 3-35
TCP Slow Start when connection begins, cwnd
= 1 MSS example: MSS = 500 bytes
& RTT = 200 msec initial rate = 20 kbps
available bandwidth may be >> MSS/RTT desirable to quickly ramp up
to respectable rate increase rate exponentially
until first loss event or when threshold reached double cwnd every RTT done by incrementing cwnd
by 1 for every ACK received
Host Aone segment
RT
T
Host B
time
two segments
four segments
Transport Layer 3-36
Transitioning into/out of slowstartssthresh: cwnd threshold maintained by TCP on loss event: set ssthresh to cwnd/2 ; gp to slowstart
remember (half of) TCP rate when congestion last occurred when cwnd >= ssthresh: transition from slowstart to
congestion avoidance phase
slow start timeout
ssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd/2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd > ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s),as allowed
new ACKdupACKcount++if dupACKcount=3set cwind=1MSS
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion
avoidance
Transport Layer 3-37
TCP: congestion avoidance
when cwnd > ssthresh grow cwnd linearly increase cwnd
by 1 MSS per RTT approach possible
congestion slower than in slowstart
implementation: cwnd = cwnd + MSS^2/cwnd for each ACK received
ACKs: increase cwnd by 1 MSS per RTT: additive increase
loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease true in macro picture may require Slow
Start first to grow up to this
AIMD
AIMD: Additive IncreaseMultiplicative Decrease
Transport Layer 3-38
TCP congestion control FSM: overview
slow start
congestionavoidance
fastrecovery
cwnd > ssthresh
loss:timeout
loss:timeout
new ACK loss:3dupACK
loss:3dupACK
loss:timeout
Transport Layer 3-39
TCP congestion control FSM: details
slow start
congestionavoidance
fastrecovery
timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd/2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd > ssthresh
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s),as allowed
new ACKcwnd = cwnd + MSS (MSS/cwnd)dupACKcount = 0transmit new segment(s),as allowed
new ACK.
dupACKcount++
duplicate ACK
ssthresh= cwnd/2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3
dupACKcount++
duplicate ACK
ssthresh= cwnd/2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd/2cwnd = 1 dupACKcount = 0retransmit missing segment
cwnd = cwnd + MSStransmit new segment(s), as allowed
duplicate ACK
cwnd = ssthreshdupACKcount = 0
New ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
Transport Layer 3-40
Popular “flavors” of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
ents
)
Transport Layer 3-41
Summary: TCP Congestion Control
when cwnd < ssthresh, sender in slow-start phase, window grows exponentially.
when cwnd >= ssthresh, sender is in congestion-avoidance phase, window grows linearly.
when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh
when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS.
Transport Layer 3-42
TCP throughput
Q: what’s average throughout of TCP as function of window size, RTT? ignoring slow start
let W be window size when loss occurs.when window is W, throughput is
W/RTT just after loss, window drops to W/2,
throughput to W/2RTT. average throughout: .75 W/RTT
Transport Layer 3-43
fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Transport Layer 3-44
Why is TCP fair?
Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
ion
2 t h
rou g
h pu t
congestion avoidance: additive increase
loss: decrease window by factor of 2congestion avoidance: additive increase
loss: decrease window by factor of 2
Transport Layer 3-45
Fairness (more)
Fairness and UDP multimedia apps
often do not use TCP do not want rate
throttled by congestion control
instead use UDP: pump audio/video at
constant rate, tolerate packet loss
Fairness and parallel TCP connections
nothing prevents app from opening parallel connections between 2 hosts.
web browsers do this example: link of rate R
supporting already9 connections; new app asks for 1 TCP,
gets rate R/10 new app asks for 11 TCPs,
gets R/2 !