computer networking lent term m/w/f 11-midday lt1 in gates building slide set 5 andrew w. moore
DESCRIPTION
Computer Networking Lent Term M/W/F 11-midday LT1 in Gates Building Slide Set 5 Andrew W. Moore andrew.moore@ cl.cam.ac.uk January 2013. Topic 5 – Transport. learn about transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport - PowerPoint PPT PresentationTRANSCRIPT
1
Computer Networking
Lent Term MWF 11-middayLT1 in Gates Building
Slide Set 5
Andrew W Mooreandrewmooreclcamacuk
January 2013
2
Topic 5 ndash TransportOur goals bull understand principles
behind transport layer servicesndash multiplexing
demultiplexingndash reliable data transferndash flow controlndash congestion control
bull learn about transport layer protocols in the Internetndash UDP connectionless transportndash TCP connection-oriented
transportndash TCP congestion control
3
Transport services and protocolsbull provide logical communication
between app processes running on different hosts
bull transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash rcv side reassembles segments into messages passes to app layer
bull more than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
4
Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes
ndash relies on enhances network layer services
Household analogy12 kids sending letters to 12
kidsbull processes = kidsbull app messages = letters in
envelopesbull hosts = housesbull transport protocol = Ann
and Billbull network-layer protocol =
postal service
5
Internet transport-layer protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-
effortrdquo IPbull services not available
ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
2
Topic 5 ndash TransportOur goals bull understand principles
behind transport layer servicesndash multiplexing
demultiplexingndash reliable data transferndash flow controlndash congestion control
bull learn about transport layer protocols in the Internetndash UDP connectionless transportndash TCP connection-oriented
transportndash TCP congestion control
3
Transport services and protocolsbull provide logical communication
between app processes running on different hosts
bull transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash rcv side reassembles segments into messages passes to app layer
bull more than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
4
Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes
ndash relies on enhances network layer services
Household analogy12 kids sending letters to 12
kidsbull processes = kidsbull app messages = letters in
envelopesbull hosts = housesbull transport protocol = Ann
and Billbull network-layer protocol =
postal service
5
Internet transport-layer protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-
effortrdquo IPbull services not available
ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
3
Transport services and protocolsbull provide logical communication
between app processes running on different hosts
bull transport protocols run in end systems ndash send side breaks app
messages into segments passes to network layer
ndash rcv side reassembles segments into messages passes to app layer
bull more than one transport protocol available to appsndash Internet TCP and UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
4
Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes
ndash relies on enhances network layer services
Household analogy12 kids sending letters to 12
kidsbull processes = kidsbull app messages = letters in
envelopesbull hosts = housesbull transport protocol = Ann
and Billbull network-layer protocol =
postal service
5
Internet transport-layer protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-
effortrdquo IPbull services not available
ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
4
Transport vs network layerbull network layer logical communication between hostsbull transport layer logical communication between processes
ndash relies on enhances network layer services
Household analogy12 kids sending letters to 12
kidsbull processes = kidsbull app messages = letters in
envelopesbull hosts = housesbull transport protocol = Ann
and Billbull network-layer protocol =
postal service
5
Internet transport-layer protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-
effortrdquo IPbull services not available
ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
5
Internet transport-layer protocols
bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup
bull unreliable unordered delivery UDPndash no-frills extension of ldquobest-
effortrdquo IPbull services not available
ndash delay guaranteesndash bandwidth guarantees
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
logical end-end transport
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
6
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)
Multiplexing at send host
Multiplexingdemultiplexing(Transport-layer style)
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
7
How transport-layer demultiplexing worksbull host receives IP datagrams
ndash each datagram has source IP address destination IP address
ndash each datagram carries 1 transport-layer segment
ndash each segment has source destination port number
bull host uses IP addresses amp port numbers to direct segment to appropriate socket
source port dest port
32 bits
applicationdata
(message)
other header fields
TCPUDP segment format
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
8
Connectionless demultiplexingbull Create sockets with port
numbersDatagramSocket mySocket1 = new
DatagramSocket(12534)DatagramSocket mySocket2 = new
DatagramSocket(12535)
bull UDP socket identified by two-tuple
(dest IP address dest port number)
bull When host receives UDP segmentndash checks destination port
number in segmentndash directs UDP segment to socket
with that port numberbull IP datagrams with different
source IP addresses andor source port numbers directed to same socket
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
9
Connectionless demux (cont)DatagramSocket serverSocket = new DatagramSocket(6428)
ClientIPB
P2
client IP A
P1P1P3
serverIP C
SP 6428DP 9157
SP 9157DP 6428
SP 6428DP 5775
SP 5775DP 6428
SP provides ldquoreturn addressrdquo
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
10
Connection-oriented demuxbull TCP socket identified by 4-
tuple ndash source IP addressndash source port numberndash dest IP addressndash dest port number
bull recv host uses all four values to direct segment to appropriate socket
bull Server host may support many simultaneous TCP socketsndash each socket identified by its
own 4-tuplebull Web servers have different
sockets for each connecting clientndash non-persistent HTTP will have
different socket for each request
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
11
Connection-oriented demux (cont)
ClientIPB
P1
client IP A
P1P2P4
serverIP C
SP 9157DP 80
SP 9157DP 80
P5 P6 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
12
Connection-oriented demux Threaded Web Server
ClientIPB
P1
client IP A
P1P2
serverIP C
SP 9157DP 80
SP 9157DP 80
P4 P3
D-IPCS-IP AD-IPC
S-IP B
SP 5775DP 80
D-IPCS-IP B
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
13
UDP User Datagram Protocol [RFC 768]
bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol
bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order to
appbull connectionless
ndash no handshaking between UDP sender receiver
ndash each UDP segment handled independently of others
Why is there a UDPbull no connection establishment
(which can add delay)bull simple no connection state at
sender receiverbull small segment headerbull no congestion control UDP can
blast away as fast as desired
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
14
UDP more
bull often used for streaming multimedia appsndash loss tolerantndash rate sensitive
bull other UDP usesndash DNSndash SNMP
bull reliable transfer over UDP add reliability at application layerndash application-specific error
recovery
source port dest port
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength in
bytes of UDPsegmentincluding
header
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
15
UDP checksum
Senderbull treat segment contents as
sequence of 16-bit integersbull checksum addition (1rsquos
complement sum) of segment contents
bull sender puts checksum value into UDP checksum field
Receiverbull compute checksum of received
segmentbull check if computed checksum
equals checksum field valuendash NO - error detectedndash YES - no error detected But
maybe errors nonetheless More later hellip
Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
16
Internet Checksum(time travel warning ndash we covered this earlier)
bull Notendash When adding numbers a carryout from the
most significant bit needs to be added to the result
bull Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparound
sumchecksum
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
17
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
18
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
19
Principles of Reliable data transferbull important in app transport link layersbull top-10 list of important networking topics
bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
rdt_rcv()
udt_rcv()
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
20
Reliable data transfer getting started
sendside
receiveside
rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
udt_send() called by rdtto transfer packet over
unreliable channel to receiver
rdt_rcv() called by rdt to deliver data to upper
rdt_rcv()
udt_rcv()
udt_rcv() called when packet arrives on rcv-side of channel
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
21
Reliable data transfer getting started
Wersquollbull incrementally develop sender receiver sides of
reliable data transfer protocol (rdt)bull consider only unidirectional data transfer
ndash but control info will flow on both directionsbull use finite state machines (FSM) to specify sender
receiver
state1
state2
event causing state transitionactions taken on state transition
state when in this ldquostaterdquo next state uniquely
determined by next event
eventactions
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
22
KR state machines ndash a note
BewareKurose and Ross has a confusingconfused attitude to
state-machinesIrsquove attempted to normalise the representationUPSHOT these slides have differing information to the
KR book (from which the RDT example is taken)in KR ldquoactions takenrdquo appear wide-ranging my
interpretation is more specificrelevant
Statename
Statename
Relevant event causing state transitionRelevant action taken on state transitionstate when in this ldquostaterdquo
next state uniquely determined by next
event eventactions
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
23
Rdt10 reliable transfer over a reliable channel
bull underlying channel perfectly reliablendash no bit errorsndash no loss of packets
bull separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel
IDLE udt_send(packet)rdt_send(data)
rdt_rcv(data)IDLEudt_rcv(packet)
sender receiver
Event
Action
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
24
Rdt20 channel with bit errors
bull underlying channel may flip bits in packetndash checksum to detect bit errors
bull the question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells sender that
packet received is OKndash negative acknowledgements (NAKs) receiver explicitly tells sender
that packet had errorsndash sender retransmits packet on receipt of NAK
bull new mechanisms in rdt20 (beyond rdt10)ndash error detectionndash receiver feedback control msgs (ACKNAK) receiver-gtsender
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
25
rdt20 FSM specification
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
Waitingfor reply
IDLE
sender
receiverrdt_send(data)
L
Note the sender holds a copy of the packet being sent until the delivery is acknowledged
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
26
rdt20 operation with no errors
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
27
rdt20 error scenario
L
IDLE Waitingfor reply
IDLE
udt_send(packet)
rdt_rcv(data)udt_send(ACK)
udt_rcv(packet) ampamp notcorrupt(packet)
udt_rcv(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp isNAK(reply)
udt_send(NAK)
udt_rcv(packet) ampamp corrupt(packet)
rdt_send(data)
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
28
rdt20 has a fatal flawWhat happens if ACKNAK corruptedbull sender doesnrsquot know what happened at receiverbull canrsquot just retransmit possible duplicate
Handling duplicates bull sender retransmits current
packet if ACKNAK garbledbull sender adds sequence number
to each packetbull receiver discards (doesnrsquot
deliver) duplicate packet
Sender sends one packet then waits for receiver response
stop and wait
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
29
rdt21 sender handles garbled ACKNAKs
IDLE
sequence=0udt_send(packet)
rdt_send(data)
WaitingFor reply udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
udt_send(packet)
udt_rcv(reply) ampamp ( corrupt(reply) ||isNAK(reply) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply)
IDLEWaitingfor reply
LL
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
udt_rcv(packet) ampamp corrupt(packet)
30
rdt21 receiver handles garbled ACKNAKs
Wait for 0 from below
udt_send(NAK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
udt_send(ACK)rdt_rcv(data)
Wait for 1 from below
udt_rcv(packet) ampamp not corrupt(packet) ampamp has_seq0(packet)
udt_send(ACK)rdt_rcv(data)
udt_send(ACK)
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet)
receive(packet) ampamp corrupt(packet)
udt_send(ACK)
udt_send(NAK)
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
31
rdt21 discussionSenderbull seq added to pktbull two seq rsquos (01) will suffice Whybull must check if received ACKNAK corrupted bull twice as many states
ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has a0 or 1 sequence number
Receiverbull must check if received packet is duplicate
ndash state indicates whether 0 or 1 is expected pkt seq bull note receiver can not know if its last ACKNAK received OK at
sender
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
32
rdt22 a NAK-free protocol
bull same functionality as rdt21 using ACKs onlybull instead of NAK receiver sends ACK for last pkt received OK
ndash receiver must explicitly include seq of pkt being ACKed bull duplicate ACK at sender results in same action as NAK
retransmit current pkt
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
33
rdt22 sender receiver fragments
Wait for call 0 from above
sequence=0udt_send(packet)
rdt_send(data)
udt_send(packet)
rdt_rcv(reply) ampamp ( corrupt(reply) || isACK1(reply) )
udt_rcv(reply) ampamp not corrupt(reply) ampamp isACK0(reply)
Wait for ACK
0
sender FSMfragment
Wait for 0 from below
receive(packet) ampamp not corrupt(packet) ampamp has_seq1(packet) send(ACK1)rdt_rcv(data)
udt_rcv(packet) ampamp (corrupt(packet) || has_seq1(packet))
udt_send(ACK1)receiver FSM
fragment
L
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
34
rdt30 channels with errors and loss
New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of help but not
enough
Approach sender waits ldquoreasonablerdquo amount of time for ACK
bull retransmits if no ACK received in this time
bull if pkt (or ACK) just delayed (not lost)ndash retransmission will be
duplicate but use of seq rsquos already handles this
ndash receiver must specify seq of pkt being ACKed
bull requires countdown timer
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
udt_rcv(reply) ampamp ( corrupt(reply) ||isACK(reply1) )
35
rdt30 sender
sequence=0udt_send(packet)
rdt_send(data)
Wait for
ACK0
IDLEstate 1
sequence=1udt_send(packet)
rdt_send(data)
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply0)
udt_rcv(packet) ampamp ( corrupt(packet) ||isACK(reply0) )
udt_rcv(reply) ampamp notcorrupt(reply) ampamp isACK(reply1)
LL
udt_send(packet)timeout
udt_send(packet)timeout
udt_rcv(reply)
IDLEstate 0
Wait for
ACK1
Ludt_rcv(reply)
LL
L
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
36
rdt30 in action
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
37
rdt30 in action
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
38
Performance of rdt30
bull rdt30 works but performance stinksbull ex 1 Gbps link 15 ms prop delay 8000 bit packet
U sender utilization ndash fraction of time sender busy sending
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link network protocol limits use of physical resources
dsmicrosecon8bps10bits8000
9 RLdtrans
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
39
rdt30 stop-and-wait operation
first packet bit transmitted t = 0
sender receiver
RTT
last packet bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
40
Pipelined (Packet-Window) protocols
Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash range of sequence numbers must be increasedndash buffering at sender andor receiver
bull Two generic forms of pipelined protocols go-Back-N selective repeat
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
41
Pipelining increased utilization
first packet bit transmitted t = 0
sender receiver
RTT
last bit transmitted t = L R
first packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
42
Pipelining ProtocolsGo-back-N big picturebull Sender can have up to N unacked packets in pipelinebull Rcvr only sends cumulative acks
ndash Doesnrsquot ack packet if therersquos a gapbull Sender has timer for oldest unacked packet
ndash If timer expires retransmit all unacked packets
Selective Repeat big picbull Sender can have up to N unacked packets in pipelinebull Rcvr acks individual packetsbull Sender maintains timer for each unacked packet
ndash When timer expires retransmit only unack packet
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
43
Selective repeat big picture
bull Sender can have up to N unacked packets in pipeline
bull Rcvr acks individual packetsbull Sender maintains timer for each unacked
packetndash When timer expires retransmit only unack packet
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
44
Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
45
GBN sender extended FSM
Wait udt_send(packet[base])udt_send(packet[base+1])hellipudt_send(packet[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum lt base+N) udt_send(packet[nextseqnum]) nextseqnum++ else refuse_data(data) Block
base = getacknum(reply)+1
udt_rcv(reply) ampamp notcorrupt(reply)
base=1nextseqnum=1
udt_rcv(reply) ampamp corrupt(reply)
L
L
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
46
GBN receiver extended FSM
ACK-only always send an ACK for correctly-received packet with the highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum
bull out-of-order packet ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK packet with highest in-order seq
Wait
udt_send(reply)L
udt_rcv(packet) ampamp notcurrupt(packet) ampamp hasseqnum(rcvpktexpectedseqnum)
rdt_rcv(data)udt_send(ACK)expectedseqnum++
expectedseqnum=1
L
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
47
GBN inaction
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
48
Selective Repeat
bull receiver individually acknowledges all correctly received pktsndash buffers pkts as needed for eventual in-order delivery to upper
layerbull sender only resends pkts for which ACK not received
ndash sender timer for each unACKed pktbull sender window
ndash N consecutive seq rsquosndash again limits seq s of sent unACKed pkts
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
49
Selective repeat sender receiver windows
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
50
Selective repeat
data from above bull if next available seq in window send pkt
timeout(n)bull resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next unACKed seq
senderpkt n in [rcvbase rcvbase+N-1] send ACK(n) out-of-order buffer in-order deliver (also deliver
buffered in-order pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase-Nrcvbase-1] ACK(n)
otherwise ignore
receiver
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
51
Selective repeat in action
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
52
Selective repeat dilemma
Example bull seq rsquos 0 1 2 3bull window size=3
bull receiver sees no difference in two scenarios
bull incorrectly passes duplicate data as new in (a)
Q what relationship between seq size and window size
window size le (frac12 of seq size)
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
53
Automatic Repeat Request (ARQ)
+ Self-clocking (Automatic)
+ Adaptive
+ Flexible
- Slow to start adaptconsider high BandwidthDelay product
Now lets move fromthe generic to thespecifichellip
TCP arguably the most successful protocol in the Internethellip
its an ARQ protocol
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
54
TCP Overview RFCs 793 1122 1323 2018 2581 hellip
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T C Psend buffer
TC Preceive buffer
socketdoor
segm en t
app licationwrites data
applicationreads data
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
55
TCP segment structure
source port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement number
Receive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection estab(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
56
TCP seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles out-
of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt oflsquoCrsquo echoes
back lsquoCrsquo
timesimple telnet scenario
This has led to a world of hurthellip
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
57
TCP out of order attackbull ARQ with SACK means
recipient needs copies of all packets
bull Evil attack onesend a long stream of TCP data to a server but donrsquot send the first byte
bull Recipient keeps all the subsequent data and waitshellipndash Filling buffers
bull Critical buffershellip
bull Send a legitimate request
GET indexhtml
this gets through an intrusion-detection system
then send a new segment replacing bytes 4-13 withldquopassword-filerdquo
A dumb exampleNeither of these attacks would work on a modern system
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
58
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissionsbull too long slow reaction to
segment loss
Q how to estimate RTTbull SampleRTT measured time from
segment transmission until ACK receiptndash ignore retransmissions
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent
measurements not just current SampleRTT
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
59
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )EstimatedRTT + SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
60
Some RTT estimates are never good
Associating the ACK with (a) original transmission versus (b) retransmission
KarnPartridge Algorithm ndash Ignore retransmission in measurements
(and increase timeout this makes retransmissions decreasingly aggressive)
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
61
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
62
TCP Round Trip Time and Timeout
Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT -gt larger safety marginbull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 025)
Then set timeout interval
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
63
TCP reliable data transferbull TCP creates rdt service on top of IPrsquos unreliable servicebull Pipelined segmentsbull Cumulative acksbull TCP uses single retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate acksndash ignore flow control congestion control
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
64
TCP sender eventsdata rcvd from appbull Create segment with seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer Ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to be
ackedndash start timer if there are
outstanding segments
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
65
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is acked
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
66
TCP retransmission scenariosHost A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
Seq=
92 ti
meo
utSendBase
= 100
SendBase= 120
SendBase= 120
Sendbase= 100
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
67
TCP retransmission scenarios (more)Host A
Seq=92 8 bytes data
ACK=100
loss
timeo
utHost B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
Implicit ACK(eg not Go-Back-N)
ACK=120 implicitly ACKrsquos 100 too
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
68
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
69
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 duplicate ACKs it supposes that segment after ACKed data was lostndash fast retransmit resend
segment before timer expires
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
70
Host A
timeo
ut
Host B
time
X
resend 2nd segment
Figure 337 Resending a segment after triple duplicate ACK
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
71
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
72
Silly Window SyndromeMSS advertises the amount a receiver can accept
If a transmitter has something to send ndash it will
This means small MSS values may persist- indefinitely
Solution
Wait to fill each segment but donrsquot wait indefinitely
NAGLErsquos Algorithm
If we wait too long interactive traffic is difficult
If we donrsquot want we get silly window syndrome
Solution Use a timer when the timer expires ndash send the (unfilled) segment
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
73
Flow Control ne Congestion Control
bull Flow control involves preventing senders from overrunning the capacity of the receivers
bull Congestion control involves preventing too much data from being injected into the network thereby causing switches or links to become overloaded
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
74
Flow Control ndash (bad old days)
In-Line flow control
bull XONXOFF (^s^q)
bull data-link dedicated symbols aka Ethernet(more in the Advanced Topic on Datacenters)
Dedicated wires
bull RTSCTS handshaking
bull Read (or Write) Readysignals from memory interface saying slow-downstophellip
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
75
TCP Flow Controlbull receive side of TCP
connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
app process may be slow at reading from buffer
sender wonrsquot overflowreceiverrsquos buffer by
transmitting too much too fast
flow control
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
76
TCP Flow control how it works
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees receive buffer
doesnrsquot overflow
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
77
TCP Connection Management
Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator Socket clientSocket = new
Socket(hostnameport number)
bull server contacted by client Socket connectionSocket =
welcomeSocketaccept()
Three way handshakeStep 1 client host sends TCP SYN
segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
78
TCP Connection Management (cont)
Closing a connection
client closes socket clientSocketclose()
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
79
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
80
TCP Connection Management (cont)
TCP clientlifecycle
TCP serverlifecycle
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
81
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too much data too
fast for network to handlerdquobull different from flow controlbull manifestations
ndash lost packets (buffer overflow at routers)ndash long delays (queueing in router buffers)
bull a top-10 problem
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
82
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull maximum achievable throughput
unlimited shared output link buffers
Host Alin original data
Host B
lout
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
83
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
84
Causescosts of congestion scenario 2 bull always (goodput)
bull ldquoperfectrdquo retransmission only when loss
bull retransmission of delayed (not lost) packet makes larger (than perfect case) for same
lin
lout=
lin
loutgt
linlout
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of pkt
R2
R2lin
l out
b
R2
R2lin
l out
a
R2
R2lin
l out
c
R4
R3
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
85
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
lin
Q what happens as and increase l
in
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
86
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Host A
Host B
lo
u
t
Congestion Collapse example Cocktail party effect
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
87
Approaches towards congestion control
End-end congestion controlbull no explicit feedback from
networkbull congestion inferred from end-
system observed loss delaybull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad approaches towards congestion control
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
88
TCP congestion control additive increase multiplicative decrease
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1 MSS every RTT for
each received ACK until loss detected (W W + 1W)
multiplicative decrease cut CongWin in half after loss
(W W2)
time
cong
estio
n w
indo
w si
zeSaw toothbehavior probing
for bandwidth
89SLOW START IS NOT SHOWN
89SLOW START IS NOT SHOWN
90
TCP Congestion Control detailsbull sender limits transmission LastByteSent-LastByteAcked CongWinbull Roughly
bull CongWin is dynamic function of perceived network congestion
How does sender perceive congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
three mechanismsndash AIMDndash slow startndash conservative after timeout
events
rate = CongWin
RTT Bytessec
91
AIMD Starts Too Slowly
t
Window
It could take a long time to get started
Need to start with a small CWND to avoid overloading the network
92
TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp
RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up
to respectable rate
When connection begins increase rate exponentially fast until first loss event
93
TCP Slow Start (more)bull When connection begins increase rate exponentially until
first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
94
Slow Start and the TCP Sawtooth
Loss
Exponentialldquoslow startrdquo
t
Window
Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just
start by sending a whole windowrsquos worth of data
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
91
AIMD Starts Too Slowly
t
Window
It could take a long time to get started
Need to start with a small CWND to avoid overloading the network
92
TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp
RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up
to respectable rate
When connection begins increase rate exponentially fast until first loss event
93
TCP Slow Start (more)bull When connection begins increase rate exponentially until
first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
94
Slow Start and the TCP Sawtooth
Loss
Exponentialldquoslow startrdquo
t
Window
Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just
start by sending a whole windowrsquos worth of data
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
92
TCP Slow Startbull When connection begins CongWin = 1 MSSndash Example MSS = 500 bytes amp
RTT = 200 msecndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp up
to respectable rate
When connection begins increase rate exponentially fast until first loss event
93
TCP Slow Start (more)bull When connection begins increase rate exponentially until
first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
94
Slow Start and the TCP Sawtooth
Loss
Exponentialldquoslow startrdquo
t
Window
Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just
start by sending a whole windowrsquos worth of data
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
93
TCP Slow Start (more)bull When connection begins increase rate exponentially until
first loss eventndash double CongWin every RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
94
Slow Start and the TCP Sawtooth
Loss
Exponentialldquoslow startrdquo
t
Window
Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just
start by sending a whole windowrsquos worth of data
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
94
Slow Start and the TCP Sawtooth
Loss
Exponentialldquoslow startrdquo
t
Window
Why is it called slow-start Because TCP originally hadno congestion control mechanism The source would just
start by sending a whole windowrsquos worth of data
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
95
Refinement inferring lossbull After 3 dup ACKs
ndash CongWin is cut in halfndash window then grows linearly
bull But after timeout eventndash CongWin instead set to 1 MSS ndash window then grows exponentiallyndash to a threshold then grows linearly
3 dup ACKs indicates network capable of delivering some segments timeout indicates a ldquomore alarmingrdquo congestion scenario
Philosophy
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
96
RefinementQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set
to 12 of CongWin just before loss event
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
97
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
98
TCP sender congestion controlState Event TCP Sender Action Commentary
Slow Start (SS) ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
99
Repeating Slow Start After Timeout
t
Window
Slow-start restart Go back to CWND of 1 MSS but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND Ie
SSTHRESH
TimeoutFast Retransmission
SSThreshSet to Here
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
100
TCP throughput
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Ignore slow start
bull Let W be the window size when loss occursbull When window is W throughput is WRTTbull Just after loss window drops to W2
throughput to W2RTT bull Average throughout 75 WRTT
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
101
TCP Futures TCP over ldquolong fat pipesrdquo
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segmentsbull Throughput in terms of loss rate p
bull L = 210-10 Ouch bull New versions of TCP for high-speed
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
Calculation on Simple Model(cwnd in units of MSS)
bull Assume loss occurs whenever cwnd reaches Wndash Recovery by fast retransmit
bull Window W2 W2+1 W2+2 hellipW W2 hellipndash W2 RTTs then drop then repeat
bull Average throughput 75W(MSSRTT)ndash One packet dropped out of (W2)(3W4)ndash Packet drop rate p = (83) W-2
bull Throughput = (MSSRTT) sqrt(32p)
102
HINT KNOW THIS SLIDE
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
Three Congestion Control Challenges ndash or Why AIMD
bull Single flow adjusting to bottleneck bandwidthndash Without any a priori knowledgendash Could be a Gbps link could be a modem
bull Single flow adjusting to variations in bandwidthndash When bandwidth decreases must lower sending ratendash When bandwidth increases must increase sending rate
bull Multiple flows sharing the bandwidthndash Must avoid overloading networkndash And share bandwidth ldquofairlyrdquo among the flows
103
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
104
Problem 1 Single Flow Fixed BW
bull Want to get a first-order estimate of the available bandwidthndash Assume bandwidth is fixedndash Ignore presence of other flows
bull Want to start slow but rapidly increase rate until packet drop occurs (ldquoslow-startrdquo)
bull Adjustment ndash cwnd initially set to 1 (MSS)ndash cwnd++ upon receipt of ACK
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
105
Problems with Slow-Startbull Slow-start can result in many losses
ndash Roughly the size of cwnd ~ BWRTT
bull Examplendash At some point cwnd is enough to fill ldquopiperdquondash After another RTT cwnd is double its previous valuendash All the excess packets are dropped
bull Need a more gentle adjustment algorithm once have rough estimate of bandwidthndash Rest of design discussion focuses on this
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
Problem 2 Single Flow Varying BW
Want to track available bandwidthbull Oscillate around its current valuebull If you never send more than your current rate you
wonrsquot know if more bandwidth is available
Possible variations (in terms of change per RTT)bull Multiplicative increase or decrease
cwnd cwnd a bull Additive increase or decrease
cwnd cwnd +- b
106
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
Four alternatives
bull AIAD gentle increase gentle decrease
bull AIMD gentle increase drastic decrease
bull MIAD drastic increase gentle decreasendash too many losses eliminate
bull MIMD drastic increase and decrease
107
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
108
Problem 3 Multiple Flows
bull Want steady state to be ldquofairrdquo
bull Many notions of fairness but here just require two identical flows to end up with the same bandwidth
bull This eliminates MIMD and AIADndash As we shall seehellip
bull AIMD is the only remaining solutionndash Not really but close enoughhellip
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
109
Recall Buffer and Window Dynamics
bull No congestion x increases by one packetRTT every RTTbull Congestion decrease x by factor 2
A BC = 50 pktsRTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if gt 20
Rate (pktsRTT)
x
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
110
AIMD Sharing DynamicsA Bx1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion rate increases by one packetRTT every RTT Congestion decrease rate by factor 2
Rates equalize fair share
x2
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
111
AIAD Sharing Dynamics
A Bx1
D E No congestion x increases by one packetRTT every RTT Congestion decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
112
Simple Model of Congestion Control
bull Two TCP connectionsndash Rates x1 and x2
bull Congestion when sumgt1
bull Efficiency sum near 1
bull Fairness xrsquos converge
Bandwidth for User 1 x1B
andw
idth
for U
ser 2
x2
Efficiencyline
2 user example
overload
underload
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
113
Example
User 1 x1
Use
r 2 x
2
fairnessline
efficiencyline
1
1
bull Total bandwidth 1
Inefficient x1+x2=07
(02 05)
Congested x1+x2=12
(07 05)
Efficient x1+x2=1Not fair
(07 03)
Efficient x1+x2=1Fair
(05 05)
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
114
AIAD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(x1h-aDx2h-aD)
(x1h-aD+aI)x2h-aD+aI))bull Increase x + aI
bull Decrease x - aD
bull Does not converge to fairness
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
115
MIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bdx1hbdx2h)
(bIbDx1hbIbDx2h)
bull Increase xbI
bull Decrease xbD
bull Does not converge to fairness
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
116
(bDx1h+aIbDx2h+aI)
AIMD
Bandwidth for User 1 x1
Ban
dwid
th fo
r Use
r 2 x
2
fairnessline
efficiencyline
(x1hx2h)
(bDx1hbDx2h)
bull Increase x+aD
bull Decrease xbD
bull Converges to fairness
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
117
Why is AIMD fair(a pretty animationhellip)
Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Bandwidth for Connection 1
Ban
dwid
th fo
r Con
necti
on 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
118
Fairness (more)Fairness and UDPbull Multimedia apps may not
use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull (Ancient yet ongoing) Research area TCP friendly
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Web browsers do this bull Example link of rate R
supporting 9 connections ndash new app asks for 1 TCP gets rate
R10ndash new app asks for 11 TCPs gets
R2
bull Recall Multiple browser sessions (and the potential for syncronized loss)
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-
119
Synchronized Flows Many TCP Flowsbull Aggregate window has
same dynamicsbull Therefore buffer occupancy
has same dynamicsbull Rule-of-thumb still holds
bull Independent desynchronized
bull Central limit theorem says the aggregate becomes Gaussian
bull Variance (buffer size) decreases as N increases
Some TCP issues outstandinghellip
ProbabilityDistribution
t
Buffer Size
t
- Slide 1
- Topic 5 ndash Transport
- Transport services and protocols
- Transport vs network layer
- Internet transport-layer protocols
- Multiplexingdemultiplexing (Transport-layer style)
- How transport-layer demultiplexing works
- Connectionless demultiplexing
- Connectionless demux (cont)
- Connection-oriented demux
- Connection-oriented demux (cont)
- Connection-oriented demux Threaded Web Server
- UDP User Datagram Protocol [RFC 768]
- UDP more
- UDP checksum
- Internet Checksum (time travel warning ndash we covered this earlie
- Principles of Reliable data transfer
- Principles of Reliable data transfer (2)
- Principles of Reliable data transfer (3)
- Reliable data transfer getting started
- Reliable data transfer getting started (2)
- KR state machines ndash a note
- Rdt10 reliable transfer over a reliable channel
- Rdt20 channel with bit errors
- rdt20 FSM specification
- rdt20 operation with no errors
- rdt20 error scenario
- rdt20 has a fatal flaw
- rdt21 sender handles garbled ACKNAKs
- rdt21 receiver handles garbled ACKNAKs
- rdt21 discussion
- rdt22 a NAK-free protocol
- rdt22 sender receiver fragments
- rdt30 channels with errors and loss
- rdt30 sender
- rdt30 in action
- rdt30 in action (2)
- Performance of rdt30
- rdt30 stop-and-wait operation
- Pipelined (Packet-Window) protocols
- Pipelining increased utilization
- Pipelining Protocols
- Selective repeat big picture
- Go-Back-N
- GBN sender extended FSM
- GBN receiver extended FSM
- GBN in action
- Selective Repeat
- Selective repeat sender receiver windows
- Selective repeat
- Selective repeat in action
- Selective repeat dilemma
- Automatic Repeat Request (ARQ)
- TCP Overview RFCs 793 1122 1323 2018 2581 hellip
- TCP segment structure
- TCP seq rsquos and ACKs
- TCP out of order attack
- TCP Round Trip Time and Timeout
- TCP Round Trip Time and Timeout (2)
- Some RTT estimates are never good
- Example RTT estimation
- TCP Round Trip Time and Timeout (3)
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (more)
- TCP ACK generation [RFC 1122 RFC 2581]
- Fast Retransmit
- Slide 70
- Fast retransmit algorithm
- Silly Window Syndrome
- Flow Control ne Congestion Control
- Flow Control ndash (bad old days)
- TCP Flow Control
- TCP Flow control how it works
- TCP Connection Management
- TCP Connection Management (cont)
- TCP Connection Management (cont) (2)
- TCP Connection Management (cont)
- Principles of Congestion Control
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- TCP congestion control additive increase multiplicative decre
- Slide 89
- TCP Congestion Control details
- AIMD Starts Too Slowly
- TCP Slow Start
- TCP Slow Start (more)
- Slow Start and the TCP Sawtooth
- Refinement inferring loss
- Refinement
- Summary TCP Congestion Control
- TCP sender congestion control
- Repeating Slow Start After Timeout
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- Calculation on Simple Model (cwnd in units of MSS)
- Three Congestion Control Challenges ndash or Why AIMD
- Problem 1 Single Flow Fixed BW
- Problems with Slow-Start
- Problem 2 Single Flow Varying BW
- Four alternatives
- Problem 3 Multiple Flows
- Recall Buffer and Window Dynamics
- AIMD Sharing Dynamics
- AIAD Sharing Dynamics
- Simple Model of Congestion Control
- Example
- AIAD
- MIMD
- AIMD
- Why is AIMD fair (a pretty animationhellip)
- Fairness (more)
- Some TCP issues outstandinghellip
-